Sparse superposition encoder and decoder for communications system

ABSTRACT

A computationally feasible encoding and decoding arrangement and method for transmission of data over an additive white Gaussian noise channel with average codeword power constraint employs sparse superposition codes. The code words are linear combinations of subsets of vectors from a given dictionary, with the possible messages indexed by the choice of subset. An adaptive successive decoder is shown to be reliable with error probability exponentially small for all rates below the Shannon capacity.

RELATIONSHIP TO OTHER APPLICATION

This application claims the benefit of the filing date of U.S.Provisional Patent Application Ser. No. 61/332,407 filed May 7, 2010,Conf. No. 1102 (Foreign Filing License Granted) in the names of the sameinventors as herein. The disclosure in the identified United StatesProvisional Patent Application is incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates generally to data transmission systems, and moreparticularly, to a system of encoding, transmitting, and decoding datathat is fast and reliable, and transmits data at rates near theoreticalcapacity along a noisy transmission medium.

2. Description of the Prior Art

Historically, in the analog communication era the FCC would allocate apredetermined bandwidth, i.e., frequency band, for transmission ofinformation, illustratively music, over the air as the transmissionmedium. The signal typically took the form of a sinusoidal carrier wavethat was modulated in response to the information. The modulationgenerally constituted analog modulation, where the amplitude of thecarrier signal (AM) was varied in response to the information, oralternatively the frequency of the carrier signal was varied (FM) inresponse to the information desired to be transmitted. At the receivingend of the transmission, a receiver, typically a radio, consistingprimarily of a demodulator, would produce a signal responsive to theamplitude or frequency of the modulated carrier, and eliminate thecarrier itself. The received signal was intended to replicate theoriginal information, yet was subjected to the noise of the transmissionchannel.

During the 1940s, a mathematical analysis was presented that shifted thethinking as to the manner by which information could reliably betransferred. Mathematician Claude Shannon introduced the communicationsmodel for coded information in which an encoder is introduced before anymodulation and transmission and at the receiver a decoder is introducedafter any demodulation. Shannon proved that rather than listening tonoisy communication, the decoding end of the transmission couldessentially recover the originally intended information, as if the noisewere removed, even though the transmitted signal could not easily bedistinguished in the presence of the noise. One example of this proof isin modern compact disc players where the music heard is essentially freeof noise notwithstanding that the compact disc medium might havescratches or other defects.

In regard of the foregoing, Shannon identified two of the threesignificant elements associated with reliable communications. The firstconcerned the probability of error, whereby in an event of sufficientnoise corruption, the information cannot be reproduced at the decoder.This established a need for a system wherein as code length increases,the probability of error decreases, preferably exponentially.

The second significant element of noise removal related to the rate ofthe communication, which is the ratio of the length of the originalinformation to the length in its coded form. Shannon provedmathematically that there is a maximum rate of transmission for anygiven transmission medium, called the channel capacity C.

The standard model for noise is Gaussian. In the case of Gaussian noise,the maximum rate of the data channel between the encoder and the decoderas determined by Shannon corresponds to the relationshipC=(½)log₂(1+P/σ²), where P/σ² corresponds to the signal-to-noise ratio(i.e., the ratio of the signal power to the noise power, where power isthe average energy per transmitted symbol). Here the value P correspondsto a power constraint. More specifically, there is a constraint on theamount of energy that would be used during transmission of theinformation.

The third significant element associated with reliable communications isthe code complexity, comprising encoder and decoder size (e.g., size ofworking memory and processors), the encoder and decoder computationtime, and the time delay between sequences of source information anddecoded information.

To date, no one other than the inventors herein has achieved acomputationally feasible, mathematically proven scheme that achievesrates of transmission that are arbitrarily close to the Shannoncapacity, while also achieving exponentially small error probability,for an additive noise channel.

Low density parity check (LDPC) codes and so called “turbo” codes wereempirically demonstrated (in the 1990s) through simulations to achievehigh rate and small error probability for a range of code sizes, thatmake these ubiquitous for current coded communications devices, butthese codes lack demonstration of performance scalability for largercode sizes. It has not been proven mathematically that for rate nearcapacity, that these codes will achieve the low probabilities of errorthat will in the future be required of communications systems. Anexception is the case of the comparatively simple erasure channel, whichis not an additive noise channel.

Polarization codes, a more recent development of the last three years ofa class of computationally feasible codes for channels with a finiteinput set, do achieve any rate below capacity, but the scaling of theerror probability is not as effective.

There is therefore a need for a code system for communications that canbe mathematically proven as well as empirically demonstrated to achievethe necessary low probabilities of error, high rate, and feasiblecomplexity, for real-valued additive noise channels.

There is additionally a need for a code system that can mathematicallybe proven to be scalable wherein the probability of error decreasesexponentially as the code word length is increased, with an acceptablescaling of complexity, and at rates of transmission that approach theShannon capacity.

SUMMARY OF THE INVENTION

The foregoing and other deficiencies in the prior art are solved by thisinvention, which provides, in accordance with a first apparatus aspectthereof, sparse superposition encoder for a structured code for encodingdigital information for transmission over a data channel. In accordancewith the invention, there is provided a memory for storing a designmatrix (also called the dictionary) formed of a plurality of columnvectors X₁, X₂, . . . , X_(N), each such vector having n coordinates. Aninput is provided for entering a sequence of input bits u₁, u₂, . . . ,u_(K) which determine a plurality of coefficients β₁, . . . , β_(N).Each of the coefficients is associated with a respective one of thevectors of the design matrix to form codeword vectors, with selectablereal or complex-valued entries. The entries take the form ofsuperpositions β₁X₁+β₂X₂+ . . . +β_(N)X_(N). The sequence of bits u₁,u₂, . . . , u_(K) constitute in this embodiment of the invention atleast a portion of the digital information.

In one embodiment, the plurality of the coefficients β_(j) haveselectably a determined non-zero value, or a zero value. In furtherembodiments, at least some of the plurality of the coefficients β_(j)have a predetermined value multiplied selectably by +1, or thepredetermined value multiplied by −1. In still further embodiments ofthe invention, at least some of the plurality of the coefficients β_(j)have a zero value, the number non-zero being denoted L and the valueB=N/L controlling the extent of sparsity, as will be further describedin detail herein.

In a specific illustrative embodiment of the invention, the designmatrix that is stored in the memory is partitioned into L sections, eachsuch section having B columns, where L>1. In a further aspect of thisembodiment, each of the L sections of size B has B memory positions, onefor each column of the dictionary, where B has a value corresponding toa power of 2. The positions are addressed (i.e., selected) by binarystrings of length log₂(B). The input bit string of length K=L log₂ B is,in some embodiments, split into L substrings, wherein for each sectionthe associated substring provides the memory address of which one columnis flagged to have a non-zero coefficient. In an advantageousembodiment, only 1 out of the B coefficients in each section isnon-zero.

In a further embodiment of the invention, the L sections each hasallocated a respective power that determines the squared magnitudes ofthe non-zero coefficients, denoted P₁, P₂, . . . , P_(L), i.e., one fromeach section. In a further embodiment, the respectively allocated powerssum to a total P to achieve a predetermined transmission power. In stillfurther embodiments, the allocated powers are determined in a set ofvariable power assignments that permit a code rate up to value C_(B)where, with increasing sparsity B, this value approaches the capacityC=½ log₂(1+P/σ²) for the Gaussian noise channel of noise variance σ².

In one embodiment of the invention, the code rate is R=K/n, for anarbitrary R where R<C, for an additive channel of capacity C. In such acase, the partitioned superposition code rate is R=(L log B)/n.

In another embodiment of the invention, there is provided an adder thatcomputes each entry of the codeword as the superposition of thecorresponding dictionary elements for which the coefficients arenon-zero.

In a still further embodiment of the invention, there are provided nadders for computing the codeword entries as the superposition ofselected L columns of the dictionary in parallel. In an advantageousembodiment, before initiating communications, the specified magnitudesare pre-multiplied to the columns of each section of the design matrixX, so that only adders are subsequently required of the encoderprocessor to form the code-words. Further in accordance with theembodiment of the invention, R/log(B) is arranged to be bounded so thatencoder computation time to form the superposition of L columns is notlarger than order n, thereby yielding constant computation time persymbol sent.

In another embodiment of the invention, the encoder size complexity isnot more than the nBL memory positions to hold the design matrix and then adders. Moreover, in some embodiments, the value of B is chosen to benot more than a constant multiplied by n, whereupon also L is not morethan n divided by a log. In this manner, the encoder size complexity nBLis not more than n³. Also R/log(B) is, in some embodiments, chosen to besmall.

In yet another embodiment of the invention, the input to the code arisesas the output of a Reed-Solomon outer code of alphabet size B and lengthL. This serves to maintain an optimal separation, the distance beingmeasured by the fraction of distinct selections of non-zero terms.

In accordance with an advantageous embodiment of the invention, thedictionary is generated by independent standard normal random variables.Preferably, the random variables are provided to a specifiedpredetermined precision. In accordance with some aspects of thisembodiment of the invention, the dictionary is generated by independent,equiprobable, +1 or −1, random variables.

In accordance with a second apparatus aspect of the invention, there isprovided an adaptive successive decoder for a structured code wherebydigital information that has been received over a transmission channelis decoded. There is provided in this embodiment of the invention amemory for storing a design matrix (dictionary) formed of a plurality ofvectors X₁, X₂, . . . X_(N), each such vector having n coordinates. Aninput receives the digital information Y from the transmission channel,the digital information having been encoded as a plurality ofcoefficients β₁ . . . β_(N), each of the coefficients being associatedwith a respective one of the vectors of the design matrix to formcodeword vectors in the form of superpositions β₁X₁+β₂X₂+ . . .+β_(N)X_(N). The superpositions have been distorted during transmissionto the form Y. In addition, there is provided a first inner productprocessor for computing inner products of Y with each of the pluralityof vectors X₁, X₂, . . . X_(N), stored in the memory to determine whichof the inner products has a value above a predetermined threshold value.

In one embodiment of this second apparatus aspect of the invention thefirst inner product processor performs a plurality of first innerproducts in parallel.

As a result of data transmission over a noisy channel, the resultingdistortion of the superpositions is responsive to an additive noisevector ξ, having a distribution N(0,σ²I).

In a further embodiment of the invention, there is additionally provideda processor of adders for superimposing the columns of the design matrixthat have inner product values above the predetermined threshold value.The columns are flagged and the superposition of flagged columns aretermed the “fit.” In the practice of this second aspect of theinvention, the fit is subtracted from Y leaving a residual vector r. Afurther inner product processor, in some embodiments, receives theresidual vector r and computes selected inner products of the residual rwith each of the plurality of vectors X₁, X₂, . . . X_(N), notpreviously flagged, for determining which of these columns to flag ashaving therein an inner product value above the predetermined thresholdvalue.

In an advantageous embodiment of the invention, upon receipt of theresidual vector r by the further inner product processor, a new Y vectoris entered into the input so as to for achieve simultaneous pipelinedinner product processing.

In a further embodiment of the invention, there are additionallyprovided k−2 further inner product processors arranged to operatesequentially with one another and with the first and further innerproduct processors. The additional k−2 inner product processors computeselected inner products of respectively associated ones of residualswith each of the plurality of vectors X₁, X₂, . . . X_(N), notpreviously flagged, to determine which of these columns to flag ashaving there an inner product value above the predetermined thresholdvalue. In accordance with some embodiments, the k inner productprocessors are configured as a computational pipeline to performsimultaneously their respective inner products of residuals associatedwith a sequence of corresponding received Y vectors.

In a still further embodiment, in each of the inner product processorsthere are further provided N accumulators, each associated with arespective column of the design matrix memory to enable parallelprocesses. In another embodiment, each of the inner product processorsis further provided with N multipliers, each of which is associated witha respective column of the design matrix, for effecting parallelprocessing.

In yet another embodiment, there are further provided a plurality ofcomparators for comparing respective ones of the inner products with apredetermined threshold value. The plurality of comparators, in someembodiments, store a flag responsive to the comparison of the innerproducts with the predetermined threshold value.

In accordance with a specific illustrative embodiment of the invention,there is provided a processor for computing superposition of fitcomponents from the flagged columns of the design matrix. In a furtherembodiment, there are provided k−1 inner product processors, each ofwhich computes, in succession, the inner product of each column that hasnot previously been flagged, with a linear combination of Y and ofprevious fit components.

In a still further embodiment of the invention, there is provided aprocessor by which the received Y and the fit components aresuccessively orthogonalized.

The linear combination of Y and of previous fit components is, in oneembodiment, formed with weights responsive to observed fractions ofpreviously flagged terms. In a further embodiment, the linearcombination of Y and of previous fit components is formed with weightsresponsive to expected fractions of previously flagged terms. In ahighly advantageous embodiment of the invention, the expected fractionsof flagged terms are determined by an update function processor.

The expected fractions of flagged terms are pre-computed, in someembodiments, before communication, and are stored in a memory of thedecoder for its use.

The update function processor determines g_(L)(x) which evaluates theconditional expected total fraction of flagged terms on a step if thefraction on the previous step is x. The fraction of flagged terms is, insome embodiments of the invention, weighted by the power allocationdivided by the total power. In further embodiments of the invention, theupdate function processor determines g_(L)(x) as a linear combination ofprobabilities of events of inner product above threshold.

In accordance with a further embodiment of the invention, the dictionaryis partitioned whereby each column has a distinct memory position andthe sequence of binary addresses of flagged memory positions forms thedecoded bit string. In one embodiment, the occurrence of more than oneflagged memory position in a section or no flagged memory positions in asection is denoted as an erasure, and one incorrect flagged position ina section is denoted as an error. In a still further embodiment, theoutput is provided to an outer Reed-Solomon code that completes thecorrection of any remaining small fraction of section mistakes.

In accordance with a third apparatus aspect of the invention, there isprovided a performance-scalable structured code system for transferringdata over a data channel having code rate capacity of C. The system isprovided with an encoder for specified system parameters of input-lengthK and code-length n. The encoder is provided with a memory for storing adesign matrix formed of a plurality of vectors X₁, X₂, . . . X_(N), eachsuch vector having n coordinates. Additionally, the encoder has an inputfor entering a plurality of bits bits u₁, u₂, . . . , u_(K) thatdetermine a plurality of coefficients β₁, . . . , β_(N). Each of thecoefficients is associated with a respective one of the vectors of thedesign matrix to form codeword vectors in the form of superpositionsβ₁X₁+β₂X₂+ . . . +β_(N)X_(N). There is additionally provided on theencoder an output for delivering the codeword vectors to thetransmission channel. In addition to the foregoing, there is provided adecoder having an input for receiving the superpositions β₁X₁+β₂X₂+ . .. +β_(N)X_(N), the superpositions having been distorted duringtransmission to the form Y. The decoder is further provided with a firstinner product processor for computing inner products of Y with each ofthe plurality of vectors X₁, X₂, . . . , X_(N), stored in the memory todetermine which of the inner products has a value above a predeterminedthreshold value.

In one embodiment of this third apparatus aspect of the invention, theencoder and decoder adapt a choice of system parameters to produce asmallest error probability for a specified code rate and an availablecode complexity.

In a further embodiment, in response to the code-length n, the choice ofsystem parameters has exponentially small error probability and a codecomplexity scale not more than n³ for any code rate R less than thecapacity C for the Gaussian additive noise channel.

In an advantageous embodiment of the invention, there is provided aperformance-scale processor for setting of systems parameters K and n.The performance-scale processor is responsive to specification of thepower available to the encoder of the channel. In other embodiments, theperformance-scale processor is responsive to specification of noisecharacteristics of the channel. In still further embodiments, theperformance-scale processor is responsive to a target error probability.In yet other embodiments, the performance-scale processor is responsiveto specification of a target decoder complexity. In still otherembodiments, the performance-scale processor is responsive tospecification of a rate R, being any specified value less than C.

In some embodiments of the invention the performance-scale processorsets values of L and B for partitioning of the design matrix. Inrespective other embodiments, the performance-scale processor:

-   -   sets values of power allocation to each non-zero coefficient;    -   sets values of threshold of inner product test statistics; and    -   sets a value of the maximum number of steps of the decoder.

In an advantageous embodiment, the decoder includes multiple innerproduct processors operating in succession to provide steps of selectionof columns of the fit. The performance-scale processor performssuccessive evaluations of an update function g_(L)(x) specifying anexpected total fraction of correctly flagged terms on a step if thetotal fraction on the previous step were equal to the value x.

In further embodiments, the fraction of flagged terms is weighted by thepower allocation divided by the total power. The update functionprocessor in some embodiments, determines g_(L)(x) as a linearcombination of probabilities determined for events of an inner productabove threshold. The system parameters are, in some embodiments,optimized by examination of a sequence of system parameters andperforming successive evaluations of the update function for each.

In the practice of some embodiments of the invention, the encoder is asparse superposition encoder. The decoder is in some embodiments anadaptive successive decoder. An outer Reed-Solomon encoder is, in someembodiments, matched to the inner sparse superposition encoder. In otherembodiments, the outer Reed-Solomon decoder is matched to the adaptivesuccessive decoder.

In a specific illustrative embodiment of the invention, the encoder is asparse partitioned superposition encoder, and the decoder is an adaptivesuccessive decoder, with parameter values set at those demonstrated toproduce exponentially small probability of more than a small specifiedfraction of mistakes, with exponent responsive to C_(B)−R. The decoderhas a hardware implemented size-complexity not more than a constanttimes n² B, a constant time-complexity rate, a delay not more than aconstant times log(B), and a code rate R up to a value C_(B) ofquantified approach to the Channel Capacity C of the Gaussian noisechannel as the values of n and B are scaled to account for increasingdensity and the extent of computer memory and processors.

Further in accordance with this embodiment, the inner-code sparsesuperposition encoder and decoder having the characteristics hereinabove set forth are matched to an outer Reed-Solomon encoder anddecoder, respectively, whereby mistakes are corrected, except in anevent of an exponentially small probability.

In accordance with a method aspect of the invention there is provided amethod of decoding data, the method including the steps of:

computing an inner products of a received signal Y with each column of aX stored in a design matrix;

identifying ones of the inner products that exceed a predeterminedthreshold value;

forming an initial fit fit₁; and, then in succession for k>1,

computing inner products of residuals Y−fit_(k−1), with each column ofX;

identifying columns for which the inner product exceeds a predeterminedthreshold value;

adding those columns for which the inner product exceeds thepredetermined threshold value to the fit fit_(k−1); and

selectably terminating computing of inner products of residuals when kis a specific multiple of log B or when no inner products of residualsexceed a predetermined threshold value.

In one embodiment of this method aspect of the invention, there areprovided the further steps of:

subtracting a sum of the identified products that exceed a predeterminedthreshold value from Y to form a residual r; and

computing an inner products of the residual r with each column of a Xstored in the design matrix.

In a further embodiment, there are further provided the steps of:

repeating the steps of subtracting a sum of the identified products thatexceed a predetermined threshold value from Y to form a residual r andcomputing an inner products of the residual r with each column of a Xstored in the design matrix; and

selectably terminating computing of inner products of residuals when kis a specific multiple of log B or when no inner products of residualsexceed a predetermined threshold value.

BRIEF DESCRIPTION OF THE DRAWING

Comprehension of the invention is facilitated by reading the followingdetailed description, in conjunction with the annexed drawing, in which:

FIG. 1 is a simplified schematic representation of an encoderconstructed in accordance with the principles of the invention arrangedto deliver data encoded as coefficients multiplied by values stored in adesign matrix to a data channel having a predetermined maximumtransmission capacity, the encoded data being delivered to a decoderthat contains a local copy of a design matrix and a coefficientextraction processor, the decoder being constructed in accordance withthe principles of the invention;

FIG. 2 is a simplified schematic representation of an encoderconstructed in accordance with the principles of the invention arrangedto deliver data encoded as coefficients multiplied by values stored in adesign matrix to a data channel having a predetermined maximumtransmission capacity, the encoded data being delivered to a decoderthat contains a local copy of a design matrix and an adaptive successivedecoding processor, the decoder being constructed in accordance with theprinciples of the invention;

FIG. 3 is a simplified schematic representation of an encoderconstructed in accordance with the principles of the invention arrangedto deliver data encoded as coefficients multiplied by values stored in adesign matrix to a data channel having a predetermined maximumtransmission capacity, the encoded data being delivered to a decoderthat contains a plurality of local copies of a design matrix, eachassociated with a sequential pipelined decoding processor constructed inaccordance with the principles of the invention that achieves continuousdata decoding;

FIG. 4 is a graphical representation of a plot of the function g_(L)(x),wherein the dots indicate the sequence q_(1,k) ^(adj) for the 16 steps.B=2¹⁶, snr=7, R=0.74 and L is taken to be equal to B (Note:snr→“signal-to-noise ratio”). The height reached by the g_(L)(x) curveat the final step corresponds to a weighted correct detection ratetarget of 0.993, un-weighted 0.986, for a failed detection rate targetof 0.014. The accumulated false alarm rate bound is 0.008. Theprobability of mistake rates larger than these targets is bounded by4.8×10⁻⁴;

FIG. 5 is a graphical representation of a progression of a specificillustrative embodiment of the invention, wherein snr=15. The weighted(unweighted) detection rate is 0.995 (0.983) for a failed detection rateof 0.017 and the false alarm rate is 0.006. The probability of mistakeslarger than these targets is bounded by 5.4×10⁻⁴;

FIG. 6 is a graphical representation of a progression of a specificillustrative embodiment of the invention, wherein snr=1. The detectionrate (both weighted and un-weighted) is 0.944 and the false alarm andfailed detection rates are 0.016 and 0.056 respectively, with thecorresponding error probability bounded by 2.1×10⁻⁴;

FIG. 7 is a graphical representation of a plot of an achievable rate asa function of B for snr=15. Section error rate is controlled to bebetween 9 and 10%. For the curve using simulation runs the rates areexhibited for which the empirical probability of making more than 10%section mistakes is near 10⁻³;

FIG. 8 is a graphical representation of a plot of an achievable rate asa function of B for snr=7. Section error rate is controlled to bebetween 9 and 10%. For the curve using simulation runs the rates areexhibited for which the empirical probability of making more than 10%section mistakes is near 10⁻³; and

FIG. 9 is a graphical representation of a plot of an achievable rate asa function of B for snr=1. Section error rate is controlled to bebetween 9 and 10%. For the curve using simulation runs the rates areexhibited for which the empirical probability of making more than 10%section mistakes is near 10⁻³.

DETAILED DESCRIPTION Glossary of Terms

Adaptive Successive Decoder: An iterative decoder for a partitionedsuperposition code with hardware design in which an overlapping sequenceof sections is tested each step to see which have suitable teststatistics above threshold, and flags these to correspond with decodedmessage segments. The invention here-in determines that an adaptivesuccessive decoder is mistake rate scalable for all code rates R<C_(B)for the Gaussian channel, with size complexity n²B, delay snr log B,constant time complexity rate, and a specified sequence C_(B) thatapproach the capacity C with increasing B. Thereby when composed with asuitable outer code it is performance scalable with exponentially smallerror probability and specified control of complexity. [See also:Partitioned Superposition Code, Fixed Successive Decoder, Code-Rate,Complexity, Mistake Rate Scalability, Performance Scalability.]

Additive Noise Channel: A channel specification of the form Y=c+ξ wherec is the codeword, ξ is the noise vector, and Y is the vector of nreceived values. Such a discrete-time model arises from consideration ofseveral steps of the communication system (modulator, transmitter,transmission channel, demodulator and filter) as one code channel forthe purpose of focus on issues of coding. It is called a White AdditiveNoise Channel if the successive values of the resulting noise vector areuncorrelated (wherein the purpose of the filter, a so-called whiteningfilter, is to produce a successive removal of any such correlation inthe original noise of the transmission channel). A channel in which thetransmitted signal is attenuated (faded) by a determined amount isconverted to an additive noise channel by a rescaling of the magnitudeof the received sequence.

Block Error (also called “error”): The event that the decoded message oflength K is not equal to the encoded message string. Associate with itis the block error probability.

Channel Capacity C (also called Shannon Capacity): For any channel it isthe supremum (least upper bound) of code rates R at which the message isdecoded, where for any positive error probability there is asufficiently complex encoder/decoder pair that has code rate R and errorprobability as specified. By Shannon theory it is explicitly expressedvia specification of signal power and channel characteristics. [See,also “Channel Code System,” “Code Rate,” and “Error Probability”]

Channel Code System: A distillation of the ingredients of the parts of acommunication system that focus on the operation and performance of thepathway in succession starting with the message bit sequence, includingthe channel encoder, the channel mapping the sequence of values to besent to the values received at each recipient, the channel decoder (onefor each recipient), and the decoded message bit sequence (one at eachrecipient), the purpose of which is to avoid the mistakes that wouldresult from transmission without coding, and the cost of which will beexpressed in coding complexity and in limitations on code rate [Seealso: “Communication System”].

Channel Decoder (also called the “decoder”): A device for converting nreceived values (the received vector Y) into a string of K bits, thedecoded message.

Channel Encoder (also called the “encoder”): A device for converting a Kbit message into a string of n values (the codeword) to be sent across achannel, the values being of a form (binary or real or complex) as issuited to the use of the channel. The value n is called the code-lengthor block-length of the code.

Code Complexity (also called “computational complexity”): Encompassessize complexity, time complexity, and delay. Specific to a hardwareinstantiation, the size complexity is the sum of the number of fixedmemory locations, the number of additional memory locations forworkspace of the decoding algorithm, and the number of elementaryprocessors (e.g. multiplier/accumulators, adders, comparators, memoryaddress selectors) that may act in parallel in encoding and decodingoperation. With reference to the decoder, the time complexity (or timecomplexity rate) is the number of elementary operations performed perpipelined received string Y divided by the length of the string n. Timecomplexity of the encoder is defined analogously. With reference to thedecoder, the delay is defined as the count of the number of indicesbetween the flow of received strings Y and the flow of decoded stringsconcurrently produced.

Code Rate: The ratio R=K/n of the number of message bits to the numberof values to be sent across a channel.

Communication System: A system for electronic transfer of source data(voice, music, data bases, financial data, images, movies, text,computer files for radio, telephone, television, internet) through amedium (wires, cables, or electromagnetic radiation) to a destination,consisting of the sequence of actions of the following devices: a sourcecompression (source encoder) providing a binary sequence rendering ofthe source (called the message bit sequence); a channel encoderproviding a sequence of values for transmission, a modulator, atransmitter, a transmission channel, and, for each recipient, areceiver, a filter, a demodulator, a channel decoder, and a sourcedecoder, the outcome of which is a desirably precise rendering of theoriginal source data by each recipient.

Computational Complexity: See, “Code Complexity.”

Decoder: (See, “Channel Decoder”)

Design Matrix (or “Dictionary”): An n by N matrix X of values known tothe encoder and decoder. In the context of coding these matrices ariseby strategies called superposition codes in which the codewords arelinear combinations of the columns of X, in which case powerrequirements of the code are maintained by restrictions on the scale ofthe norms of these columns and their linear combinations.

Dictionary: See, “Design Matrix.”

Encoder Term Specification: Each partitioned message segment of lengthlog(B) gives the memory position at which it is flagged that a column isselected (equivalently it is a flagged entry indicating a non-zerocoefficient).

(Fixed) Successive Decoder: An iterative decoder for a partitionedsuperposition code in which it is pre-specified what sequence ofsections is to be decoded. Fixed successive decoders that haveexponentially small error probability at rates up to capacityunfortunately have exponentially large complexity.

Fraction of Section Mistakes (also called “section error rate”): Theproportion of the sections of a message string that are in error. Forpartitioned superposition codes with adaptive successive decoder, by theinvention herein, the mistake rate is likely not more than a targetmistake rate which is an expression inversely proportional to s=log₂(B).

Gaussian Channel: An additive noise channel in which the distribution ofthe noise is a mean zero normal (Gaussian) with the specified varianceσ². Assumed to be whitened, it is an Additive White Gaussian NoiseChannel (AWGN). The Shannon Capacity of this channel is C=½log₂(1+P/σ²).

Iterative Decoder: In the context of superposition codes an iterativedecoder is one in which there are multiple steps, where for each stepthere is a processor that takes the results of previous steps andupdates the determination of terms of the linear combination thatprovided the codeword.

Message Sections: A partition of a length K message string into Lsections of length s each of which conveys a choice of B=2^(s), withwhich K=Ls. Typical values of are between about 8 and about 20, thoughno restriction is made. Also typical is for B and L to be comparable invalue, and so to is the codelength n, via the rate relationship nR=K=Llog₂(B). That is, the number L of sections matches the codelength n towithin a logarithmic factor.

Message Strings: The result of breaking a potentially unending sequenceof bits into successive blocks of length denoted as K bits, called thesequence of messages. Typical values of K range from several hundred toseveral tens of thousands, though no restriction is made.

Noise Variance σ²: The expected squared magnitude of the components ofthe noise vector in an Additive White Noise Channel.

Outer Code for Inner Code Mistake Correction: A code system in which anouter code such as a Reed-Solomon code is concatenated (composed) withanother code (which then becomes the inner code) for the purpose ofcorrection of the small fraction of mistakes made by the inner code.Such a composition converts a code with specified mistake-rate scalinginto a code with exponentially small error probability, with slightoverall code rate reduction by an amount determined by the targetmistake rate [See, also “Fraction of Section Mistakes,” and “Scaling ofSection Mistakes”].

Performance Scalable Codes: A performance scalable code system (as firstachieved for additive noise channels by the invention here-in) is astructured code system in which for any code rate R less than channelcapacity the codes have scalable error probability with an exponent Edepending on C−R and scalable complexity with space complexity not morethan a specific constant times n³, delay not more than a constant timeslog n, and time complexity rate not more than a constant, where theconstants depend only on the power requirement and the channelspecification (e.g., through the signal to noise ratio snr=P/σ²). [See,also “Structured Code System,” “Scaling of Error Probability,” and“Scaling of Code Complexity”]

Partitioned Superposition Code: A sparse superposition code in which theN columns of the matrix X are partitioned into L sections of size B,with at most one column selected in each section. It is a structuredcode system, because a size (n′,L′) code is nested in a larger size(n,L) code by restricting to the use of the first L′<L sections and thefirst n′<n sections in both encoding and decoding. [See, also “SparseSuperposition Code” and “Structured Code System”]

Scaling of Code Complexity: For a specified sequence of code systemswith specific hardware instantiation, the scaling of complexity is theexpression of the size complexity, time complexity, and delay astypically non-decreasing functions of the code-length n. In formalcomputer science, feasibility is associated with there being a finitepower of n that bounds these expressions of complexity. Forcommunication industry purposes, implementation has more exactingrequirements. The space complexity may coincide with expressions boundedb n² or n³ but certainly not n⁵ or greater. Refined space complexityexpressions arise with specific codes in which there can be dependenceon products of other measures of size (e.g., n, L and B for dictionarybased codes invented herein). The time complexity rate needs to beconstant to prevent computation from interfering with communication ratein the setting of a continuing sequence of received Y vectors. It is notknown what are the best possible expressions for delay, it may benecessary to permit it to grow slowly with n, e.g., logarithmically, asis the case for the designs herein. [See, also “Code Complexity”]

Scaling of Error Probability: For a sequence of code systems ofcode-length n, an exponential error probability (also calledexponentially small error probability or geometrically small errorprobability) is a probability of error not more than value of the form10^(−nE), with a positive exponent E that conveys the rate of change oflog probability with increasing code-length n. Probabilities bounded byvalues of the form n^(w)10^(−nE) are also said to be exponentiallysmall, where w and E depend on characteristics of the code and thechannel. When a sequence of code systems has error probability governedby an expression of the form 10^(−LE) indexed by a related parameter(such as the number of message sections L), that agrees with n to withina log factor, we still say that the error probability is exponentiallysmall (or more specifically it is exponentially small in L). By Shannontheory, the exponent E can be positive only for code rates R<C.

Scaling of Section Mistake Rates: A code system with message sections issaid to be mistake rate scalable if there is a decreasing sequence oftarget mistake rates approaching 0, said target mistake rate dependingon s=log(B), such that the probability with which the mistake rate isgreater than the target is exponentially small in L/s with exponentpositive for any code rates R<C_(B) where C_(B) approaches the capacityC.

Section Error (also called a “section mistake”): When the message stringis partitioned into sections (sub-blocks), a section error is the eventthat the corresponding portion of the decoded message is not equal tothat portion of the message.

Section Error Rate: See, “Fraction of Section Mistakes.”

Section Mistake: See, “Section Error.”

Shannon Capacity: See, “Channel Capacity.”

Signal Power: A constraint on the transmission power of a communicationsystem that translates to the average squared magnitude P of thesequence of n real or complex values to be sent.

Sparse Superposition Code: A code in which there is a design matrix forwhich codewords are linear combinations of not more than a specifiednumber L of columns of X. The message is specified by the selection ofcolumns (or equivalently by the selection of which coefficients of thelinear combination are non-zero).

Structured Code System: A set of code systems, one for each block-lengthn, in which there is a nesting of ingredients of the encoder and decoderdesign of smaller block-lengths as specializations of the designs forlarger block-lengths.

DESCRIPTION OF SPECIFIC ILLUSTRATIVE EMBODIMENTS OF THE INVENTION

FIG. 1 is a simplified schematic representation of a transmission system100 having an encoder 120 constructed in accordance with the principlesof the invention arranged to deliver data encoded as coefficientsmultiplied by values stored in a design matrix to a data channel 150having a predetermined maximum transmission capacity, the encoded databeing delivered to a decoder 160 that contains a local copy of thedesign matrix 165 and a coefficient extraction processor 170.

Encoder 120 contains in this specific illustrative embodiment of theinvention a logic unit 125 that receives message bits u=(u₁, u₂, . . .u_(K)) and maps these into a sequence of flags (one for each column ofthe dictionary), wherein a flag=1 specifies that column is to beincluded in the linear combination (with non-zero coefficient) andflag=0 specifies that column is excluded (i.e., is assigned a zerocoefficient value).

In the specific embodiment of partitioned superposition codes, in eachsection of the partition there is only one out of B column selected tohave flag=1. In this embodiment, the action of logic unit 125 is toparse the message bit stream into segments each having log₂(B) bits,said bit stream segments specifying (selecting, flagging) the memoryaddress of the one chosen column in the corresponding section of adesign matrix 130, said selected column to be included in thesuperposition with a non-zero coefficient value, whereas allnon-selected columns are not included in the superposition (i.e., have azero coefficient value and a flag value set to 0).

It is a highly advantageous aspect of the present invention that thosewho would design a communications system in accordance with theprinciples of the present invention are able to use any size of designmatrix. More specifically, as the size of the design matrix isincreased, the size of the input stream K=L log(B) and the codelength nare increased (wherein the ratio is held fixed at code rate R=K/n), saidincrease in K and n resulting in an exponentially decreasing probabilityof decoding error, as will be described in detail in a later section ofthis disclosure. Such adaptability of the present communications systemenables the system to be tailored to the quality of the receiving deviceor to the error requirements of the particular application.

Design matrix 130 is in the form of a memory that contains random valuesX (not shown). In this embodiment of the invention, a succession offlags of columns for linear combination, as noted above, are received bythe design matrix from logic unit 125, wherein columns with flag=1 areincluded in the linear combination (with non-zero coefficient value) andcolumns with flag=0 are excluded from the linear combination (zerocoefficient value). The information that is desired to be transmitted,i.e., the message, is thereby contained in the choice of the subsetincluded with non-zero coefficients. Each coefficient is combined with arespectively associated one of the columns of X contained in designmatrix 130, and said columns are superimposed thus forming a code wordXβ=β₁X₁+β₂X₂+ . . . β_(N)X_(N). Thus, the data that is delivered to datachannel 150 comprises a sequence of linear combinations of the selectedcolumns of the design matrix X

Data channel 150 is any channel that takes real or complex inputs andproduces real or complex received values, subjected to noise, said datachannel encompassing several parts of a standard communication system(e.g., the modulator, transmitter, transmission channel, receiver,filter, demodulator), said channel having a Shannon capacity, as hereinabove described. Moreover, it is an additive noise channel in which thenoise may be characterized as having a Gaussian distribution. Thus, acode word Xβ is issued by encoder 120, propagated along noisy datachannel 150, and the resulting vector Y is received at decoder 160,where Y is the sum of the code word plus the noise, i.e., Y=Xβ+ξ, whereξ is the noise vector.

Decoder 160, in this simplistic specific illustrative embodiment of theinvention, receives Y and subjects it to a decoding process directedtoward the identification and extraction of the coefficients β_(f). Morespecifically, a coefficient extraction processor 170, which ideallywould approximate the functionality of a theoretically optimal leastsquares decoder processor, would achieve the stochastically smallestdistribution of mistakes. A comparison of the present practical decoderto the theoretically optimum least squares decoder of the sparsesuperposition codes is set forth in a subsequent section of thisdisclosure.

In addition to the foregoing, decoder 160 is, in an advantageousembodiment of the invention, provided with an update function generator175 that is useful to determine an update function g_(L)(x) thatidentifies the likely performance of iterative decoding steps. Inaccordance with this aspect of the invention, the update functiong_(L)(x) is responsive to signal power allocation. In a furtherembodiment of the invention, update function g_(L)(x) is responsive tothe size B and the number L of columns of the design matrix. In a stillfurther embodiment, the update function is responsive to a ratecharacteristic R of data transmission and to a signal-to-noise ratio(“snr”) characteristic of the data transmission. Specificcharacteristics of update function g_(L)(x) are described in greaterdetail below in relation to FIGS. 4-6.

FIG. 2 is a simplified schematic representation of a transmission system200 constructed in accordance with the principles of the invention.Elements of structure or methodology that have previously been discussedare similarly designated. Transmission system 200 is provided withencoder 120 arranged to deliver data encoded as coefficients multipliedby values stored in a design matrix to a data channel 150 having apredetermined maximum transmission capacity, as described above.

The encoded data is delivered to a decoder 220 that contains a localcopy of a design matrix 165 and a control processor 240 that functionsin accordance with the steps set forth in function block 250,denominated “adaptive successive decoding.” Local design matrix 165 issubstantially identical to encoder design matrix 130. As shown in thisfigure, adaptive successive decoding at function block 250 includes thesteps of:

Computing inner products of Y with respectively associated columns of Xfrom local design matrix 230;

Identifying ones of the inner products that exceed a predeterminedthreshold value;

Forming an initial fit;

Computing iteratively inner products of residuals Y−fit_(k−1) with eachremaining column of X;

Identifying the columns for which the inner products exceed apredetermined threshold value and adding them to the fit; and

Stopping the process at a step where k is a specific multiple of log B,or when no inner products exceed the threshold.

FIG. 3 is a simplified schematic representation of a transmission system300 constructed in accordance with the principles of the invention.Elements of structure that have previously been discussed are similarlydesignated. As previously noted, an encoder 120 delivers data encoded inthe form of a string of flags specifying a selection of a set ofnon-zero coefficients, as hereinabove described, which linearly combinecolumns of X stored in a design matrix 125, to data channel 150 that, asnoted, has a predetermined maximum capacity. The encoded data isdelivered to a decoder system 320 that contains a plurality of localcopies of a design matrix, each associated with a sequential decodingprocessor, the combination of local design matrix and processor beingdesignated 330-1, 330-2, to 330-k. Decoder system 320 is constructed inaccordance with the principles of the invention to achieve continuousdata decoding.

In operation, a vector Y corresponding to a received signal is comprisedof an original codeword Xβ plus a noise vector ξ. Noise vector ξ derivesfrom data channel 150, and therefore Y=Xβ+ξ. Noise vector ξ may have aGaussian distribution. Vector Y is delivered to the processor in 350-1where its inner product is computed with each column of X in associatedlocal design matrix 330-1. As set forth in the steps in function block350-1, each inner product then is compared to a predetermined thresholdvalue, and if it exceeds the threshold, it is flagged and the associatedcolumn are superimposed in a fit. The fit is then subtracted from Y,leaving a residual that is delivered to processor 330-2.

Upon delivery of the residual to processor 330-2, a process step 360causes a new value of Y to be delivered from data channel 150 toprocessor 330-1. Thus, the processors are continuously active computingresiduals.

Processor 330-2 computes the inner product of the residual it receivedfrom processor 330-1, in accordance with the process of function block350-2. This is effected by computing the inner product of the residualwith not already flagged column of X in the local design matrixassociated with processor 330-2. Each of these inner products then iscompared to a predetermined threshold value, and if it exceeds thethreshold, each such associated column is flagged and added to the fit.The fit is then subtracted from the residual, leaving a further residualthat is delivered to the processor 330-3 (not shown). The processterminates when no columns of X have inner product with the residualexceeding the threshold value, or when processing has been completed byprocessor 330-k. At which point the sequence of addresses of the columnsflagged as having inner product above threshold provide the decodedmessage.

1. Introduction

For the additive white Gaussian noise channel with average codewordpower constraint, sparse superposition codes are developed, in which theencoding and decoding are computationally feasible, and thecommunication is reliable. The codewords are linear combinations ofsubsets of vectors from a given dictionary, with the possible messagesindexed by the choice of subset. An adaptive successive decoder isdeveloped, with which communication is demonstrated to be reliable witherror probability exponentially small for all rates below the Shannoncapacity.

The additive white Gaussian noise channel is basic to Shannon theory andunderlies practical communication models. Sparse superposition codes forthis channel are introduced and analyzed and fast encoders and decodersare here invented for which error probability is demonstrated to beexponentially small for any rate below the capacity. The strategy andits analysis merges modern perspectives on statistical regression, modelselection and information theory.

The development here provides the first demonstration of practicalencoders and decoders, indexed by the size n of the code, with which thecommunication is reliable for any rate below capacity, with errorprobability demonstrated to be exponentially small in n and thecomputational resources required, specified by the number of memorypositions and the number of simple processors, is demonstrated to be alow order power of n, and the processor computation time is demonstratedto be a constant per received symbol.

Performance-scalability is used to refer succinctly to the indicatedmanner in which the error probability, the code rate, and thecomputational complexity together scale with the size n of the code.

Such a favorable scalability property is essential to know for a codesystem, such that if it is performing at given code size how theperformance will thereafter scale (for instance in improved errorprobability for a given fraction of capacity if code size is doubled) tosuitably take advantage of increased computational capability (increasedcomputer memory and processors) sufficient to accommodate the increasedcode size.

A summary of this work appeared in (Barron and Joseph, ‘Toward fastreliable communication at rates near capacity with Gaussian noise’, IEEEIntern. Symp. Inform. Theory, June 2010), after the date of firstdiscloser, and thereafter an extensive manuscript has been made publiclyavailable (Barron and Joseph, ‘Sparse Superposition Codes: Fast andReliable at Rates Approaching Capacity with Gaussian Noise’,www.stat.yale.edu/˜arb4 publications), upon which the present patentmanuscript is based. Companion work by the inventors (Barron and Joseph,‘Least squares superposition coding of moderate dictionary size,reliable at rates up to channel capacity’ IEEE Intern. Syrup. Inform.Theory, June 2010) has theory for the optimal least squares decoderagain with exponentially small error probability for any rate belowcapacity, though that companion work, like all previous theoreticalcapacity-achieving schemes is lacking in practical decodability. In bothtreatments (that given here for practical decoding and that given forimpractical optimal least squares decoding) the exponent of the error isshown to depend on the difference between the capacity and the rate.Here for the practical decoder the size of the smallest gap fromcapacity is quantified in terms of design parameters of the code therebyallowing demonstration of rate approaching capacity as one adjusts thesedesign parameters.

In the familiar communication set-up, an encoder is to map input bitstrings u=(u₁, u₂, . . . , u_(K)) of length K into codewords which arelength n strings of real numbers c₁, c₂, . . . , c_(n) of norm expressedvia the power (1/n)Σ_(i=1) ^(n)c_(i) ². The average of the power acrossthe 2^(K) codewords is to be not more than P. The channel addsindependent N(0, σ²) noise to the codeword yielding a received length nstring Y. A decoder is to map it into an estimate û desired to be acorrect decoding of u. Block error is the event û≠u. When the inputstring is partitioned into sections, the section error rate is thefraction of sections not correctly decoded. The reliability requirementis that, with sufficiently large n, the section error rate is small withhigh probability or, more stringently, the block error probability issmall, averaged over input strings a as well as the distribution of Y.The communication rate R=K/n is the ratio of the number of message bitsto the number of uses of the channel required to communicate them.

The supremum of reliable rates of communication is the channel capacitygiven by

=(½)log₂(1+P/σ²), by traditional Shannon information theory. Forpractical coding the challenge is to achieve arbitrary rates below thecapacity, while guaranteeing reliable decoding in manageable computationtime.

In a communication system operating at rate R, the input bit stringsarise from input sequences u₁, u₂, . . . cut into successive K bitstrings, each of which is encoded and sent, leading to a succession ofreceived length n strings Y. The reliability aim that the block errorprobability be exponentially small is such that errors are unlikely overlong time spans. The computational aim is that coding and decodingcomputations proceed on the fly, rapidly, with the decoder having nottoo many pipelined computational units, so that there is only moderatedelay in the system.

The development here is specific to the discrete-time channel for whichY_(i)=c_(i)+ε_(i) for i=1, 2, . . . , n with real-valued inputs andoutputs and with independent Gaussian noise. Standard communicationmodels, even in continuous-time, have been reduced to this discrete-timewhite Gaussian noise setting, or to parallel uses of such, when there isa frequency band constraint for signal modulation and when there is aspecified spectrum of noise over that frequency band. Solution to thecoding problem, when married to appropriate modulation schemes, isregarded as relevant to myriad settings involving transmission overwires or cables for internet, television, or telephone communications orin wireless radio, TV, phone, satellite or other space communications.

Previous standard approaches, as discussed in Formey and Ungerboeck(IEEE Trans. Inform. Theory 1998), entail a decomposition into separateproblems of modulation, of shaping of a multivariate signalconstellation, and of coding. For coding purposes, the continuous-timemodulation and demodulation may be regarded as given so that the channelreduces to the indicated discrete-time model. In the approach developedin the invention here-in the shaping of the signal is built directlyinto the code design and not handled separately.

As shall be reviewed in the section on past work below, there arepractical schemes of specific size with empirically good performance.

However, all past works concerning sequences of practical schemes, withrates set arbitrarily below capacity, lack proof that the errorprobability will exponentially small in the size n of the code, whereinthe exponent of the error probability will depend on the differencebetween the capacity and the rate. With the decoder invented herein, itamenable to the desired analysis, providing the first theoryestablishing that a practical scheme is reliable at rates approachingcapacity for the Gaussian channel.

1.1 Sparse Superposition Codes:

The framework for superposition codes is the formation of specific formsof linear combinations of a given list of vectors. This list (or book)of vectors is denoted X₁, X₂, . . . , X_(N). Each vector has nreal-valued (or complex-valued) coordinates, for which the codewordvectors take the form of superpositionsβ1 X ₁+β₂ X ₂+β_(N) X _(N).

The vectors X_(j) provide the terms or components of the codewords withcoefficients β_(j). By design, each entry of these vectors X_(j) isindependent standard normal. The choice of codeword is conveyed throughthe coefficients, with sum of squares chosen to match the powerrequirement P. The received vector is in accordance with the statisticallinear modelY=Xβ+ε,

where X is the matrix whose columns are the vectors X₁, X₂, . . . ,X_(N) and ε is the noise vector with distribution N(0,σ²I). In somechannel models it can be convenient to allow codeword vectors andreceived vectors to have complex-valued entries, though as there is nosubstantive difference in the analysis for that case, the focus in thedescription unfolding here is on the real-valued case. The book X iscalled the design matrix consisting of p=N variables, each with nobservations, and this list of variables is also called the dictionaryof candidate terms.

For general subset superposition coding the message bit string isarranged to be conveyed by mapping it into a choice of a subset ofterms, called sent, with L coefficients non-zero, with specifiedpositive values. Denote B=N/L to be the ratio of dictionary size to thenumber of terms sent. When B is large, it is a sparse superpositioncode. In this case the number of terms sent is a small fraction L/N ofthe dictionary size.

In subset coding, it is known in advance to the encoder and decoder whatwill be the coefficient magnitude √{square root over (P_(j))} if a termis sent. Thus β_(j) ²=P_(j)1_(j sent) is equal to P_(j) if the term issent and equal to 0 otherwise. In the simplest case, the values of thenon-zero coefficients are the same, with P_(j)=P/L.

Optionally, the non-zero coefficient values may be +1 or −1 timesspecified magnitudes, in which case the superposition code is said to besigned. Then the message is conveyed by the sequence of signs as well asthe choice of subset.

For subset coding, in general, the set of permitted coefficient vectorsβ is not an algebraic field, that is, it is not closed under linearoperations. For instance, summing two coefficient vectors with distinctsets of L non-zero entries does not yield another such coefficientvector. Hence the linear statistical model here does not correspond to alinear code in the sense of traditional algebraic coding.

In this document particular focus is given to a specialization which theinventors herein call a partitioned superposition code. Here the book Xis split into L sections of size B with one term selected from each,yielding L terms in each codeword. Likewise, the coefficient vector β issplit into sections, with one coordinate non-zero in each section toindicate the selected term. Partitioning simplifies the organization ofthe encoder and the decoder.

Moreover, partitioning allows for either constant or variable powerallocation, with P_(j) equal to values P_((l)) for j in section l, whereΣ_(l) ^(L)P/(l)=P. This respects the requirement that Σ_(j sent)P_(j)=P,no matter which term is selected from each section. Set weightsπ_(j)=P_(j)/P. For any set of terms, its size induced by the weights isdefined as the stun of the π_(j) for j in that set. Two particular casesinvestigated include the case of constant power allocation and the casethat the power is proportional to

for sections l=1, 2, . . . , L. These variable power allocations areused in getting the rate up to capacity.

Most convenient with partitioned codes is the case that the section sizeB is a power of two. Then an input bit string a of length K=L log₂ Bsplits into L substrings of size log₂ B and the encoder becomes trivial.Each substring of u gives the index (or memory address) of the term tobe sent from the corresponding section.

As said, the rate of the code is R=K/n input bits per channel uses, witharbitrary rate R less than

. For the partitioned superposition code this rate is

$R = {\frac{L\;\log\; B}{n}.}$For specified L, B and R, the codelength n is (L/R)log B. This thelength n and the subset size L agree to within a log factor.

Control of the dictionary size is critical to computationallyadvantageous coding and decoding. Possible dictionary sizes are betweenthe extremes K and 2^(K) dictated by the number and size of thesections, where K is the number of input bits. In one extreme, with 1section of size B=2^(K), the design X is the whole codebook with itscolumns as the codewords, but the exponential size makes its direct useimpractical. At the other extreme there would be L=K sections, each withtwo candidate terms in subset coding or two signs of a single term insign coding with B=1; in which case X is the generator matrix of alinear code.

Between these extremes, computationally feasible, reliable, high-ratecodes are constructed with codewords corresponding to linearcombinations of subsets of terms in moderate size dictionaries, withfast decoding algorithms. In particular, for the decoder developed here,at a specific sequence of rates approaching capacity, the errorprobability is shown to be exponentially small in L/(log B)^(3/2).

For high rate, near capacity, the analysis herein requires B to be largecompared to (1+snr)² and for high reliability it also requires L to belarge compared to (1+snr)², where snr=P/σ² is the signal to noise ratio.

Entries of X are drawn independently from a normal distribution withmean zero and a variance 1 so that the codewords Xβ have a Gaussianshape to their distribution and so that the codewords have average powernear P. Other distributions for the entries of X may be considered, suchas independent equiprobable ±1, with a near Gaussian shape for thecodeword distribution obtained by the convolutions associated with sumsof terms in subsets of size L.

There is some freedom in the choice of scale of the coefficients. Herethe coordinates of the X_(j) are arranged to have variance 1 and thecoefficients of β are set to have sum of squares equal to P.Alternatively, one may simplify the coefficient representation byarranging the coordinates of X_(j) to be normal with variance P_(j) andsetting the non-zero coefficients of β to have magnitude 1. Whichever ofthese scales is convenient to the argument at hand is permitted.

1.2 Summary of Findings:

A fast sparse superposition decoder is herein described and itsproperties analyzed. The inventor herein call it adaptive successivedecoding.

For computation, it is shown that with a total number of simple parallelprocessors (multiplier-accumulators) of order n B, and total memory workspace of size n² B, it runs in a constant time per received symbol ofthe string Y.

For the communication rate, there are two cases. First, when the powerof the terms sent are the same at P L in each section, the decoder isshown to reliably achieves rates up to a rate R₀=(½)P/(P+σ²) which isless than capacity. It is close to the capacity when the signal-to-noiseratio is low. It is a deficiency of constant power allocation with thescheme here that its rate will be substantially less than the capacityif the signal-to-noise is not low.

To bring the rate higher, up to capacity, a variable power allocation isused with power P_((l)) proportional to

, for sections l from 1 to L, with improvements from a slightmodification of this power allocation for l/L near 1.

To summarize what is achieved concerning the rate, for each B≧2, thereis a positive communication rate

_(B) that the decoder herein achieves with large L. This

_(B) depends on the section size B as well as the signal to noise ratiosnr=P/σ². It approaches the Capacity

=(½)log(1+snr) as B increases, albeit slowly. The relative drop fromcapacity

${\Delta_{B} = \frac{C - C_{B}}{C}},$is accurately bounded, except for extremes of small and large snr, by anexpression near

$\frac{\left( {1.5 + {1/v}} \right)\log\;\log\; B}{\log\; B},$where

=snr/(1+snr), with other bounds given to encompass accurately also thesmall and large snr cases.

Concerning reliability, a positive error exponent function ε(

_(B)−R) is provided for R<

_(B). It is of the order (

_(B)−R)²√{square root over (log B)} for rates R near

_(B). The sparse superposition code reliably makes not more than a smallfraction of section mistakes. Combined with an outer Reed-Solomon codeto correct that small fraction of section mistakes the result is a codewith block error probability bounded by an expression exponentiallysmall in Lε(

C_(B)−R), which is exponentially small in nε(

_(B)−R)/log B. For a range of rates R not far from

_(B), the error exponent is shown to be within a √{square root over (logB)} factor of the optimum reliability exponent.

1.3 Decoding Sparse Superposition Codes:

Optimal decoding for minimal average probability of error consists offinding the codeword Xβ with coefficient vector β of the assumed formthat maximizes the posterior probability, conditioning on X and Y. Thiscoincides, in the case of equal prior probabilities, with the maximumlikelihood rule of seeking such a codeword to minimize the sum ofsquared errors in fit to Y. This is a least squares regression problemmin_(β)∥Y−X≈∥², with codeword constraints on the coefficient vector.There is the concern that exact least squares decoding iscomputationally impractical. Performance bounds for the optimal decoderare developed in the previously cited companion work achieving rates upto capacity in the constant power allocation case. Instead, here apractical decoder is developed for which desired properties ofreliability and rate approaching capacity are established in thevariable power allocation case.

The basic step of the decoder is to compute for a given vector,initially the received string Y, its inner product with each of theterms in the dictionary, as test statistics, and see which of theseinner products are above a threshold. Such a set of inner products for astep of the decoder is performed in parallel by a computational unit,e.g. a signal-processing chip with N=LB parallel accumulators, each ofwhich has pipelined computation, so that the inner product is updated asthe elements of the string arrive.

In this basic step, the terms that it decodes are among those for whichthe test statistic is above threshold. The step either selects all theterms with inner product above threshold, or a portion of these withspecified total weight. Having inner product X_(j) ^(T)Y above athreshold T=∥Y∥_(τ) corresponds to having normalized inner product X_(j)^(T)Y/∥Y∥ above a threshold τ set to be of the formτ=√{square root over (2 log B)}+a,where the logarithm is taken using base e. This threshold may also beexpressed as √{square root over (2 log B)}(1+δ_(a)) withδ_(a)=a/√{square root over (2 log B)}. The a is a positive value, freeto be specified, that impacts the behavior of the algorithm bycontrolling the fraction of terms above threshold each step. An idealvalue of a is moderately small, corresponding to δ_(a) near 0.75(log logB)/log B, plus log(1+snr)/log B when snr is not small. Having 2δ_(a)near 1.5 log log B/log B plus 4

/log B constitutes a large part of the above mentioned rate drop Δ_(B).

Having the threshold larger than √{square root over (2 log B)} impliesthat the fraction of incorrect terms above threshold is negligible. Yetit also means that only a moderate fraction of correct terms are foundto be above threshold each step.

A fit is formed at the end of each step by adding the terms that wereselected. Additional steps are used to bring the total fraction decodedup near 1.

Each subsequent step of the decoder computes updated test statistics,taking inner products of the remaining terms with a vector determinedusing Y and the previous fit, and sees which are above threshold. Forfastest operation these updates are performed on additionalcomputational units so as to allow pipelined decoding of a succession ofreceived strings. The test statistic can be the inner product of theterms X_(j) with the vector of residuals equal to the difference of Yand the previous fit. As will be explained, a variant of this statisticis developed and found to be somewhat simpler to analyze.

A key feature is that the decoding algorithm does not pre-specify whichsections of terms will be decoded on any one step. Rather it adapts thechoice in accordance with which sections have a term with an innerproduct observed to be above threshold. Thus this class of procedures iscalled adaptive successive decoding.

Concerning the advantages of variable power in the partitioned codecase, which allows the scheme herein to achieve rates near capacity, theidea is that the power allocations proportional to

give some favoring to the decoding of the higher power sections amongthose that remain each step. This produces more statistical power forthe test initially as well as retaining enough discrimination power forsubsequent steps.

Such power allocation also would arise if one were attempting tosuccessively decode one section at a time, with the signal contributionsof as yet un-decoded sections treated as noise, in a way that splits therate

into L pieces each of size

/L; however, such pre-specification of one section to decode each stepwould require the section sizes to be exponentially large to achievedesired reliability. In contrast, in the adaptive scheme herein, many ofthe sections are considered each step. The power allocations do notchange too much across many nearby sections, so that a sufficientdistribution of decodings can occur each step.

For rate near capacity, it helpful to use a modified power allocation,with power proportional to

${\max\left\{ {{\mathbb{e}}^{{- 2}C\frac{\ell - 1}{L}},u_{cut}} \right\}},$where u_(cut)=

(1+δ_(c)) with a small non-negative value of δ_(c). Thus u_(cut) can beslightly larger than

. This modification performs a slight leveling of the power allocationfor l/L near 1. It helps ensure that, even in the end game, there willbe sections for which the true terms are expected to have inner productabove threshold.

Analysis of empirical bounds on the proportions of correct detectionsinvolves events shown to be nearly independent across the L sections.The probability with which such proportions differ much from what isexpected is exponentially small in the number of sections L. In the caseof variable power allocation the inductive determination ofdistributional properties herein is seen to requires that one work withweighted proportions of events, which are sums across the terms ofindicators of events multiplied by the weights provided byπ_(j)=P_(j)/P. With bounded ratio of maximum to minimum power across thesections, such weighted proportions agree with un-weighted proportionsto within constant factors. Moreover, for indicators of independentevents, weighted proportions have similar exponential tail bounds,except that in the exponent, in place of L there isL_(π)=1/max_(j)π_(j), which is approximately a constant multiple of Lfor the designs investigated here.

1.4 An Update Function:

A key ingredient of this work is the determination of a functiong_(L):[0, 1]→[0, 1], called the update function, which depends on thedesign parameters (the power allocation and the parameters L, B and R)as well as the snr. This function g_(L)(x) determines the likelyperformance of successive steps of the algorithm. Also, for a variant ofthe residual-based test statistics, it is used to set weights ofcombination that determine the best updates of test statistics.

Let {circumflex over (q)}_(k) ^(tot) denote the weighted proportioncorrectly decoded after k steps. A sequence of deterministic valuesq_(1,k) is exhibited such that {circumflex over (q)}_(k) ^(tot) islikely to exceed q_(1,k) each step. The q_(1,k) is near the valueg_(L)(q_(1,k−1)) given by the update function, provided the false alarmrate is maintained small. Indeed, an adjusted value q_(1,k) ^(adj) isarranged to be not much less than g_(L)(q_(1,k−1) ^(adj)) where the‘adj’ in the superscript denotes an adjustment to q_(1,k) to account forfalse alarms.

Determination of whether a particular choice of design parametersprovides a total fraction of correct detections approaching 1 reduces toverification that this function g_(L)(x) remains strictly above the liney=x for some interval of the form [0,x*] with x* near 1. The successivevalues of the g_(L)(x_(k))−x_(k) at x_(k)=q_(1,k−1) ^(adj) control theerror exponents as well as the size of the improvement in the detectionrate and the number of steps of the algorithm. The final weightedfraction of failed detections is controlled by 1−g_(L)(x*).

The role of g_(L) is shown in FIG. 4. This figure provides a plot of thefunction g_(L)(x) in a specific case. The dots indicate the sequenceq_(1,k) ^(adj) for 16 steps. Here B=2¹⁶, snr=7, R=0.74 and L taken to beequal to B. The height reached by the g_(L)(x) curve at the final stepcorresponds to a weighted correct detection rate target of 0.993,un-weighted 0.986, for a failed detection rate target of 0.014. Theaccumulated false alarm rate bound is 0.008. The probability of mistakerates larger than these targets is bounded by 4.8×10⁻⁴.

Provision of g_(L)(x) and the computation of its iterates provides acomputational devise by which a proposed scheme is checked for itscapabilities.

An equally important use of g_(L)(x) is analytical analysis of theextent of positivity of the gap g_(L)(x)−x depending on the designparameters. For any power allocation there will be a largest rate R atwhich the gap remains positive over most of the interval [0, 1] forsufficient size L and B. Power allocations with P_((l)) proportional to

, or slight modifications thereof, are shown to be the form required forthe gap g_(L)(x)−x to have such positivity for rates R near

.

Analytical examination of the update function shows for large L how thechoice of the rate R controls the size of the shortfall 1−x* as well asthe minimum size of the gap g_(L)(x)−x for 0≦x≦x*, as functions of B andsnr. Thereby bounds are obtained on the mistake rate, the errorexponent, and the maximal rate for which the method produces highreliability.

To summarize, with the adaptive successive decoder and suitable powerallocation, for rates approaching capacity, the update function stayssufficiently above x over most of [0, 1] and, consequently, the decoderhas a high chance of not more than a small fraction of section mistakes.

1.5 Accounting of Section Mistakes:

Ideally, the decoder selects one term from each section, producing anoutput which is the index of the selected term. It is not in error whenthe term selected matches the one sent.

In a section a mistake occurs from an incorrect term above threshold (afalse alarm) or from failure of the correct term to provide a statisticvalue above threshold after a suitable number of steps (a faileddetection). Let {circumflex over (δ)}_(mis) refer to the faileddetection rate plus the false alarm rate, that is, the sum of thefraction of section with failed detections and the fraction of sectionswith false alarms. This sum from the two sources of mistake is at leastthe fraction of section mistakes, recognizing that both types can occur.The technique here controls this {circumflex over (δ)}_(mis) byproviding a small bound δ_(mis) that holds with high probability.

A section mistake is counted as an error if it arises from a singleincorrectly selected term. It is an erasure if no term is selected ormore than one term is selected. The distinction is that a section erroris a mistake you don't know you made and a section erasure is one youknown you made. Let {circumflex over (δ)}_(error) be the fraction ofsection errors and {circumflex over (δ)}_(erase) be the fraction ofsection erasures. In each section one sees that the associatedindicators of events satisfy the property that 1_(erase)+2 1_(error) isnot more than 1_(failed detection)+1_(false alarm). This is because anerror event requires both a failed detection and a false alarm.Accordingly 2{circumflex over (δ)}_(error)+{circumflex over (δ)}_(erase)is not more than {circumflex over (δ)}_(mis), the failed detection rateplus the false alarm rate.

1.6 An Outer Code:

An issue with this superposition scheme is that candidate subsets ofterms sent could differ from each other in only a few sections. Whenthat is so, the subsets could be difficult to distinguish, so that itwould be natural to expect a few section mistakes.

An approach is discussed which completes the task of identifying theterms by arranging sufficient distance between the subsets, usingcomposition with an outer Reed-Solomon (RS) code of rate near one. Thealphabet of the Reed-Solomon code is chosen to be a set of size B, apower of 2. Indeed in this invention it is arranged the RS symbolscorrespond to the indices of the selected terms in each section. Detailsare given in a later section. Suppose the likely event {circumflex over(δ)}_(mis)<δ_(mis) holds from the output of the inner superpositioncode. Then the outer Reed-Solomon corrects the small fraction ofremaining mistakes so that the composite decoder ends up not only withsmall section mistake rate but also with small block error probability.If R_(outer)=1−δ is the communication rate of an RS code, with 0<δ<1,then the section errors and erasures can be corrected, providedδ_(mis)≦δ.

Furthermore, if R_(inner) is the rate associated with the inner(superposition) code, then the total rate after correcting for theremaining mistakes is given by R_(total)=R_(inner)R_(outer), usingδ=δ_(mis). Moreover, if Δ_(inner) is the relative rate drop fromcapacity of the inner code, then the relative rate drop of the compositecode Δ_(total) is not more than δ_(mis)+Δ_(inner).

The end result, using the theory developed herein for the distributionof the fraction of mistakes of the superposition code, is that forsuitable rate up to a value near capacity the block error probability isexponentially small.

One may regard the composite code as a superposition code in which thesubsets are forced to maintain at least a certain minimal separation, sothat decoding to within a certain distance from the true subset impliesexact decoding.

Performance of the sparse superposition code is measured by the threefundamentals of computation, rate, and reliability.

1.7 Computational Resource of Hardware Implementation:

The main computation required of each step of the decoder is thecomputation of the inner products of the residual vectors with eachcolumn of the dictionary. Or one has computation of related statisticswhich require the same order of resource. For simplicity in thissubsection the case is described in which one works with the residualsand accepts each term above threshold. The inner products requires ordernLB multiply-and-adds each step, yielding a total computation of ordernLBm for m steps. The ideal number of steps m according to the boundsobtained herein is not more than 2+snr log B.

When there is a stream of strings Y arriving in succession at thedecoder, it is natural to organize the computations in a parallel andpipelined fashion as follows. One allocates m signal processing chips,each configured nearly identically, to do the inner products. One suchchip does the inner products with Y, a second chip does the innerproducts with the residuals from the preceding received string, and soon, up to chip m which is working on the final decoding step from thestring received several steps before.

Each signal processing chip has in parallel a number of simpleprocessors, each consisting of a multiplier and an accumulator, one foreach stored column of the dictionary under consideration, withcapability to provide pipelined accumulation of the required sum ofproducts. This permits the collection of inner products to be computedonline as the coordinates of the vectors are received. After an initialdelay of m received strings, all m chips are working simultaneously.

Moreover, for each chip there is a collection of simple comparators,which compare the computed inner products to the threshold and store,for each column, a flag of whether it is to be part of the update. Sumsof the associated columns are computed in updating the residuals (orrelated vectors) for the next step. The entries of that simplecomputation (sums of up to L values) are to be provided for the nextchip before processing the entries of the next received vector. If needbe, to keep the runtime at a constant per received symbol, one arranges2m chips, alternating between inner product chips and subset sum chips,each working simultaneously, but on strings received up to 2m stepsbefore. The runtime per received entry in the string Y is controlled bythe time required to load such an entry (or counterpart residuals on theadditional chips) at each processor on the chip and perform in parallelthe multiplication by the associated dictionary entries with resultaccumulated for the formation of the inner products.

The terminology signal processing chip refers to computational unitsthat run in parallel to perform the indicated tasks. Whether one or moreof these computational units fit on the same physical computer chipdepends on the size of the code dictionary and the current scale ofcircuit integration technology, which is an implementation matter not aconcern at the present level of decoder description.

If each of the signal processing chips keeps a local copy of thedictionary X, alleviating the challenge of numerous simultaneous memorycalls, the total computational space (memory positions) involved in thedecoder is nLBm, along with space for LBm multiplier-accumulators, toachieve constant order computation time per received symbol. Naturally,there is the alternative of increased computation time with less space;indeed, decoding by serial computation would have runtime of order nLBm.

Substituting L=nR/log B and m of order log B the computational resourceexpression nLBm simplifies. One sees that the total computationalresource required (either space or time) is of order n²B for this sparsesuperposition decoder. More precisely, to include the effect of the snron the computational resource, using the number of steps m which arisein upcoming bounds, which is within 2 of snr log B, and using R upperbounded by capacity

, the computational resource of nLBm memory positions is bounded by

snr n² B, and a number LBm of multiplier-adders bounded by

snr n B.

In concert with the action of this decoder, the additional computationalresource of a Reed-Solomon decoder acts on the indices of which term isflagged from each section to provides correction of the few mistakes.The address of which term is flagged in a section provides thecorresponding symbol for the RS decoder, with the understanding that ifa section has no term flagged or more than one term flagged it istreated as an erasure. For this section, as the literature on RS codecomputation is plentiful, it is simply note that the computationresource required is also bounded as a low order polynomial in the sizeof the code.

1.8 Achieved Rate:

This subsection discusses the nature of the rates achieved with adaptivesuccessive decoding. The invention herein achieves not only fixed ratesR<

but also rates R up to

_(B), for which the gap from capacity is of the order near 1/log B.

Two approaches are provided for evaluation of how high a rate R isachieved. For any L, B, snr, any specified error probability and anysmall specified fraction of mistakes of the inner code, numericalcomputation of the progression of g_(L)(x) permits a numericalevaluation of the largest R for which g_(L)(x) remains above xsufficiently to achieve the specified objectives.

The second approach is to provide simplified bounds to proveanalytically that the achieved rate is close to capacity, and exhibitthe nature of the closeness to capacity as function of snr and B. Thisis captured by the rate envelope C_(B) and bounds on its relative ratedrop Δ_(B). Here contributions to Δ_(B) are summarized, in a way thatprovides a partial blueprint to later developments. Fuller explanationof the origins of these contributions arise in these developments inlater sections.

The update function g_(L)(x) is near a function g(x), with differencebounded by a multiple of 1/L. Properties of this function are used toproduce a rate expression that ensures that g(x) remains above x,enabling the successive decoding to progress reliability. In the fullrate expressions developed later, there are quantities η, h, and ρ thatdetermine error exponents multiplied by L. So for large enough L, theseexponents can be taken to be arbitrarily small. Setting those quantitiesto the values for which these exponents would become 0, and ignoringterms that are small in 1/L, provides simplification giving rise to whatthe inventors call the rate envelope, denoted C_(B).

With this rate envelope, for R<C_(B), these tools enable us to relatethe exponent of reliability of the code to a positive function ofC_(B)−R times L, even for L finite.

There are two parts to the relative rate drop bound Δ_(B), which arewritten as Δ_(shape) plus Δ_(alarm), with details on these in latersections. Here these contributions are summarized to express the form ofthe bound on Δ_(B).

The second part denoted Δ_(alarm) is determined by optimizing acombination of rate drop contributions from 2δ_(a), plus a termsnr/(m−1) involving the number of steps m, plus terms involving theaccumulated false alarm rate. Using the natural logarithm, thisΔ_(alarm) is optimized at m equal to an integer part of 2+snr log B andan accumulated baseline false alarm rate of 1/[(3

+½) log B]. At this optimized m and optimized false alarm rate, thevalue of the threshold parameter δ_(a) is

$\delta_{a} = \frac{\log\left\lbrack {m\;{{snr}\left( {3 + {{1/2}C}} \right)}{\sqrt{\log\; B}/\sqrt{4\pi}}} \right\rbrack}{2\;\log\; B}$and $\Delta_{alarm} = {{2\delta_{a}} + {\frac{2}{\log\; B}.}}$At the optimized m, the δ_(a) is an increasing function of snr, withvalue approaching 0.25 log [(log B)/π]/log B for small snr and valuenear

$\frac{{{.75}\mspace{14mu}\log\;\log\; B} + {2C} - {0.25\mspace{14mu}{\log\left( {4{\pi/9}} \right)}}}{\log\; B}$for moderately large snr. The constant subtracted in the numerator 0.25log(4π/9) is about 0.08. With δ_(a) thus set, it determines the value ofthe threshold τ=√{square root over (2 log B)}(1+δ_(a)).

To obtain a small 2δ_(a), and hence small Δ_(alarm), this bound requireslog B large compared to 4

, which implies that the section size B is large compared to (1+snr)².

The Δ_(shape) depends on the choice of the variable power allocationrule, via the function g and its shape. For a specified powerallocation, it is determined by a minimal inner code rate dropcontribution at which the function has a non-negative gap g(x)−x on [0,x*], plus the contribution to the outer code rate drop associated withthe weighted proportion not detected δ*=1−g(x*). For determination ofΔ_(shape), three cases for power allocation are examined, then, for eachsnr, pick the one with the best such tradeoff, which includesdetermination of the best x*. The result of this examination is aΔ_(shape) which is a decreasing function of snr.

The first case has no leveling (δ_(c)=0). In this case the functiong(x)−x is decreasing for suitable rates. Using an optimized x* itprovides a candidate Δ_(shape) equal to 1/τ² plus

/(τ

), where

is an explicitly given expression with value near

for large

. If snr is not large, this case does not accomplish the aims becausethe term involving 1/τ, near 1/√{square root over (2 log B)}, is notnearly as small as will be demonstrated in the leveling case. Yet withsnr such that

is large compared to τ, this Δ_(shape) is acceptable, providing acontribution to the rate drop near the value 1/(2 log B). Then the totalrate drop is primarily determined by Δ_(alarm), yielding, for large snr,that Δ_(B) is near

$\frac{{1.5\mspace{14mu}\log\;\log\; B} + {4C} + 2.34}{\log\; B}.$This case is useful for a range of snr, where

exceeds a multiple of √{square root over (log B)} yet remains smallcompared to log B.

The second case has some leveling with 0<δ_(c)<snr. In this case thetypical shape of the function g(x)−x, for x in [0, 1], is that itundergoes a single oscillation, first going down, then increasing, andthen decreasing again, so there are two potential minima for x in [0,x*], one of which is at x*. In solving for the best rate drop bound, arole is demonstrated for the case that δ_(c) is such that an equal gapvalue is reached at these two minima. In this case, with optimized x*, abound on Δ_(shape) is shown, for a range of intermediate size signal tonoise ratios, to be given by the expression

${{\frac{2}{v\;\log\; B}\left\{ {2 + {\log\left( {\frac{1}{2} + \frac{v\;{??}}{4C\sqrt{2\pi}}} \right)}} \right\}} + \frac{1}{2\mspace{14mu}\log\; B}},$where

=snr/(1+snr). When 2

/

is small compared to τ/√{square root over (2π)}, this Δ_(shape) is near(1/

)(log log B)/(log B). When added to Δ_(alarm) it provides an expressionfor Δ_(B), as previously given, that is near (1.5+1/

)log log B/log B plus terms that are small in comparison.

The above expression provides the Δ_(shape) as long as snr is not toosmall and 2

/

is less than τ/√{square root over (2π)}. For 2

/

at least τ/=√{square root over (2π)}, the effect of the log log B iscanceled, though there is then an additional small remainder term thatis required to be added to the above as detailed later. The result isthat Δ_(shape) is less than const/log B for (2

/

)√{square root over (2π)} at least τ.

The third case uses constant power allocation (complete leveling withδ_(c)=snr), when snr is small. The Δ_(shape) is less than a given boundnear √{square root over (2(log log B)/log B)} when the snr is less thantwice that value. For such sufficiently small snr this Δ_(shape) withcomplete leveling becomes superior to the expression given above forpartial leveling.

Accordingly, let Δ_(shape) be the best of these values from the threecases, producing a continuous decreasing function of snr, near √{squareroot over (2(log log B)/log B)} for small snr, near (1+1/snr)log logB/(log B) for intermediate snr and near ½ log B for large snr. Likewise,the Δ_(B) bound is Δ_(shape)+Δ_(alarm). In this way one has thedependence of the rate drop on snr and section size B.

Thus let

_(B) be the rate of the composite sparse superposition inner code andReed-Solomon outer code obtained from optimizing the total relative ratedrop bound Δ_(B).

Included in Δ_(alarm) and Δ_(shape), which sum to Δ_(B), are baselinevalues of the false alarm rates and the failed detection rates,respectively, which add to provide a baseline value δ_(mis)*, and,accordingly, Δ_(B) splits as δ_(mis)* plus Δ_(B,inner), using therelative rate drop of the inner code. As detailed later, this δ_(mis)*is typically small compared to the rate drop sources from the innercode.

In putting the ingredients together, when R is less than

_(B), part of the difference

_(B)−R is used in providing slight increase past the baseline todetermine a reliable δ_(mis), and the rest of the difference is used insetting the inner code rate to insure a sufficiently positive gap g(x)−xfor reliability of the decoding progression. The relative choices aremade to produce the best resulting error exponent Lε(

_(B)−R) for the given rate.

1.9 Comparison to Least Squares:

It is appropriate to compare the rate achieved here by the practicaldecoder herein with what is achieved with theoretically optimal, butpossibly impractical, least squares decoding of these sparsesuperposition codes, subject to the constraint that there is onenon-zero coefficients in each section. Such least squares decodingprovides the stochastically smallest distribution of the number ofmistakes, with a uniform distribution on the possible messages, but ithas an unknown computation time.

In this direction, the results in the previously cited companion paperfor least squares decoding of superposition codes, partially complementwhat is give herein for the adaptive successive decoder. For optimumleast square decoding, favorable properties are demonstrated, in thecase that the power assignments P/L are the same for each section.Interestingly, the analysis techniques there are different and do notreveal rate improvement from the use of variable instead of constantpower with optimal least squares decoding. Another difference is thatwhile here there are no restrictions on B, there it is required thatB≧L^(b) for a specified section size rate b depending only on thesignal-to-noise ratio, where conveniently b tends to 1 for largesignal-to-noise, but unfortunately b gets large for small snr. Forcomparison with the scheme here, restrict attention to moderate andlarge signal-to-noise ratios, as for computational reasons, it isdesirable that B be not more than a low order polynomial in L.

Let Δ=(

−R)/

be the rate drop from capacity, with R not more than

. For least squares decoding there is a positive constant c₁ such thatthe probability of more than a fraction δ_(mis) of mistakes is less thanexp{−nc₁ min{Δ², δ_(mis)}}, for any δ_(mis) in [0,1], any positive ratedrop Δ and any size n. This bound is better than obtained for thepractical decoder herein in its freedom of any choice of mistakefraction and rate drop in obtaining this reliability. In particular, theresult for least squares does not restrict Δ to be larger than Δ_(B) anddoes not restrict δ_(mis) to be larger than a baseline value of order1/log B.

It shows that n only needs to be of size [log(1/ε)]/[c₁ min{Δ²,δ_(mis)}] for least squares to achieve probability ε of at least afraction δ_(mis) mistakes, at rate that is Δ close to capacity. Withsuitable target fractions of mistakes, the drop from capacity Δ is notmore than √{square root over ((1/c₁n)log 1/ε)}. It is of order1/√{square root over (π)} if ε is fixed; whereas, for ε exponentiallysmall in n, the associated drop from capacity Δ would need to be atleast a constant amount.

An appropriate domain for comparison is in a regime between the extremesof fixed probability ε and a probability exponentially small in n. Theprobability of error is made nearly exponentially small if the rate ispermitted to slowly approach capacity. In particular, suppose B is equalto n or a small order power of n. Pick Δ of order 1/log B to withiniterated log factors, arranged such that the rate drop Δ exceeds theenvelope Δ_(B) by an amount of that order 1/log B. One can ask, for arate drop of that moderately small size, how would the error probabilityof least squares and the practical method compare? At a suitable mistakerate, the exponent of the error probability of least squares would bequantified by n/(log B)² of order n/(log n)², neglecting log logfactors. Whereas, for the practical decoder herein the exponent would bea constant times L(Δ−Δ_(B))²(log B)^(1/2), which is of order L/(logB)^(1.5), that is, n/(log n)^(2.5). This the exponent for the practicaldecoder is within a (log n)^(0.5) factor of what is obtained for optimalleast squares decoding.

1.10 Comparison to the Optimal Form of Exponents:

It is natural to compare the rate, reliability, and code-size tradeoffthat is achieved here, by a practical scheme, with what is known to betheoretically best possible. What is known concerning the optimalprobability of error, established by Shannon and Gallager, as reviewedfor instance in work of Polyanskiy, Poor and Verdú (IEEE IT 2010), isthat the optimal probability of error is exponentially small in anexpression nε(R) which, for R near

, matches nΔ² to within a factor bounded by a constant, where Δ=(

−R)/

. As recently refined in the work of Altug and Wagner (IEEE ISIT 2010),this behavior of the exponent remains valid for Δ down to the orderremaining larger than 1/√{square root over (n)}. The reason for thatrestriction is that for Δ as small as order 1/√{square root over (n)},the optimal probability of error does not go to zero with increasingblock length (rather it is then governed by an analogous expressioninvolving the tail probability of the Gaussian distribution). It isreiterated that these optimal exponents are associated with analyseswhich provided no practical scheme to achieve them in the literature.

The bounds for the practical decoder do not rely on asymptotics, butrather finite sample bounds available for all choices of L and B andinner code rates R≦

_(B), with blocklength n=(L log B)/R. As derived herein the overallerror probability bound is exponentially small in an expression of theform L mill {Δ, Δ²√{square root over (log B)}}, provided R is enoughless than

_(B) that the additional drop from

_(B)−R is of the same order as the total drop Δ. Consequently, the errorprobability is exponentially small in

$n\;\min{\left\{ {\frac{\Delta}{\log\; B},\frac{\Delta^{2}}{\sqrt{\log\; B}}} \right\}.}$Focussing on the Δ for which the square term is the minimizer, it showsthat the error probability is exponentially small in n(

−R)²/√{square root over (log B)}, within a √{square root over (log B)}factor of optimal, for rates R for which

_(B)−R is of order between log log B/log B and 1/√{square root over (logB)}.

An alternative perspective on the rate and reliability tradeoff as inPolyanskiy, Poor, and Verdú, is to set a small block error probability eand seek the largest possible communication rate R_(opt) as a functionof the codelength. They show for n of at least moderate size, thisoptimal rate is near

${R_{opt} = {C - {\frac{\sqrt{V}}{\sqrt{n}}\sqrt{2\mspace{14mu}\log\mspace{14mu}{1/\varepsilon}}}}},$for a constant V they identify, where if ε is not small the √{squareroot over (2 log 1/ε)} is to be replaced by the upper ε quantile of thestandard normal. For small ε this expression agrees with the form of therelationship between error probability c and the exponent n(

−R_(opt))² stated above. The rates and error probabilities achieve withthe practical decoder herein have a similar form of relationship butdiffer in three respects. One is that here there is the somewhat smallern/√{square root over (log B)} in place of n, secondly the constantmultipliers do not match the optimal V, and thirdly the result herein isonly applicable for ε small enough that the rate drop is made to be atleast Δ_(B).

From either of these perspectives, the results here show that to gainprovable practicality a price is paid of needing blocklength larger by afactor of √{square root over (log B)} to have the same performance aswould be optimal without concern for practicality.

1.11 On the Signal Alphabet and Shaping:

From the cited review by Formey and Ungerboeck, as previously said, theproblem of practical communication for additive Gaussian noise channels,has been decomposed into separate problems, which in addition tomodulation, include the matters of choice of signal alphabet, of theshaping of a signal constellation, and of coding. The approach takenherein merges the signal alphabet and constellation into the coding. Thevalues of codeword symbols that arise in herein are those that can berealized via sums of columns of the dictionary, one from each section inthe partitioned case. Some background on signalling facilitatesdiscussion of relationship to other work.

By choice of signal alphabet, codes for discrete channels have beenadapted to use on Gaussian channels, with varying degrees of success. Inthe simplest case the code symbols take on only two possible values,leading to a binary input channel, by constraining the symbol alphabetto allow only the values ±√{square root over (P)} and possibly usingonly the signs of the Y_(i). With such binary signalling, the availablecapacity is not more than 1 and it is considerably less than(½)log(1+snr), except in the case of low signal-to-noise ratio. Whenconsidering snr that is not small it is preferable to not restrict tobinary signalling, to allow higher rates of communication. When usingsignals where each symbol has a number M of levels, the rate caps at logM, which is achieved in the high snr limit even without coding (simplyinfer for each symbol the level to which the received Y_(i) is closest).As quantified in Formey and Ungerboeck, for moderate snr, treating thechannel as a discrete M-ary channel of particular cross-overprobabilities and considering associated error-correcting codes allows,in theory, for reasonable performance provided log M sufficientlyexceeds log snr (and empirically good coding performance has beenrealized by LDPC and turbo codes). Nevertheless, as they discuss, therate of such discrete channels with a fixed number of levels remainsless than the capacity of the original Gaussian channel.

To bring the rate up to capacity, the codeword choices must be properlyshaped, that is, the codeword vectors should approximate a good packingof points on the n-dimensional sphere of squared radius dictated by thepower. An implication of which is that, marginally and jointly for anysubset of codeword coordinates, the set of codewords should haveempirical distribution not far from Gaussian. Such shaping is likewise aproblem for which theory dictates what is possible in terms of rate andreliability, but theory has been lacking to demonstrate whether there isa moderate or low complexity of decoding that achieves such favorablerate and error probability.

The sparse superposition code automatically takes care of the requiredshaping of the multivariate signal constellation by using linearcombinations of subsets a given set of real-valued Gaussian distributedvectors. For high snr; the role of log M being large compared to log snris replaced herein by having L large and having log B large compared to

. These sparse superposition codes are not exactly well-spaced on thesurface of an n-sphere, as inputs that agree in most sections would havenearby codewords. Nevertheless, when coupled with the Reed-Solomon outercode, sufficient distance between codewords is achieved for quantifiablyhigh reliability.

1.12 Relationships to Previous Work:

Several directions of past work are discussed that connect to what isdeveloped here. There is some prior work concerning computationalfeasibility for reliable communications near capacity for certainchannels. Building on Gallager's low density parity check codes (LDPC),iterative decoding algorithms based on statistical belief propagation inloopy networks have been empirically shown in various works to providereliable and moderately fast decoding at rates near the capacity forvarious discrete channels, and mathematically proven to provide suchproperties only in the special case of the binary erasure channel inLuby, Mizemnacher, Shokrollahi, and Spielman (IEEE IT 2001). Beliefnetworks are also used for certain empirically good schemes such asturbo codes that allow real-valued received symbols, with a discrete setof input levels. Though substantial steps have been made in the analysisof such belief networks, as summarized for instance in the book byRichardson and Urbanke (2008), there is not proof of the desiredproperties at rates near capacity for the Gaussian channel.

When both schemes are set up with rate near capacity, the criticaldistinction between empirically demonstrated code performance (as in thecase of LDPC and turbo codes), and a quantified exponential scaling oferror probability (as with sparse superposition code with adaptivesuccessive decoder) is what is asserted herein by the exponentialscaling of error probability. With such error rate scaling, as the sizeof the code doubles while maintaining the same communication rate R,then the error probability is squared. For instance an error probabilityof 10⁻⁴ then reduces to 10⁻⁸, likewise multiplying the code size by 4would reduce the error probability to 10⁻¹⁶.

In the progression of available computational ability, such doubling ofthe size of memory allocatable to the same size chip, is a customarymatter in computer chip technology that has occurred every couple ofyears. With the code invented herein one knows that the reliability willscale in such an attractive way as computational resources improve. WithLDPC and turbo codes one might guess that the error probability willlikewise improve, but with those technologies (and all other existingcode technologies) it can not be definitively asserted. Empiricalsimulation can not come to the rescue when planning for several yearsahead. Until there are the computational resources to implement suchincreased size devices one can not know whether the investment inexisting code strategies (other than ours) will be rewarded when theyare increased in size.

An approach to reliable and computationally-feasible decoding, withrestriction to binary signaling, is in the work on channel polarizationof Arikan (IEEE IT 2009) and Arikan and Telatar (IEEE IT 2010). Errorprobability is demonstrated there at a level exponentially small inn^(1/2) for fixed rates less than the binary signaling capacity. Incontrast for the scheme herein, the error probability is exponentiallysmall in n to within a logarithmic factor and communication is permittedat higher rates than would be achieved with binary signalling,approaching capacity for the Gaussian noise channel. In recent workEmmanuel Abbe adapts channel polarization to achieve the sum ratecapacity for m user binary input multiple-access channels, withspecialization to single-user channels with 2^(m) inputs. Building onthat work. Abbe and Barron (IEEE ISIT 2011) are investigating discretenear-Gaussian signalling to adapt channel polarization to the Gaussiannoise channel. That provides an alternative interesting approach toachieving rates up to capacity for the Gaussian noise, but not witherror exponents exponentially small in n.

The analysis of concatenated codes in the book of Formey (1966) is animportant fore-runner to the development of code composition givenherein. For the theory, he paired an outer Reed-Solomon code withconcatenation of optimal inner codes of Shannon-Gallager type, while,for practice he focussed on binary input channels, he paired such anouter Reed-Solomon code with inner based on linear combinations oforthogonal terms (for target rates Kin less than 1 such a basis isavailable), in which all binary coefficient sequences are possiblecodewords.

A challenge concerning theoretically good inner codes is that the numberof messages searched is exponentially large in the inner codelength.Formey made the inner codelength of logarithmic size compared to theouter codelength as a step toward practical solution. However, cautionis required with these strategies. Suppose the rate of the inner codehas a small relative drop from capacity, Δ=(

−R)/

. For at least moderate reliability, the inner codelength would be oforder at least 1/Δ². So with these the required outer codelength becomesexponential in 1/Δ².

To compare, for the Gaussian noise channel, the approach herein providesa practical decoding scheme for the inner code. Herein inner and outercodelengths are permitted that are comparable to each other. One candraw a parallel between the sections described here and theconcatenations of Formey's inner codes. However, a key difference is useherein of superposition across the sections and the simultaneousdecoding of these sections. Challenges remain in the restrictiveness ofthe relationship of the rate drop Δ to the section sizes. Nevertheless,particular rates are identified as practical and near optimal.

By having set up the channel coding via the linear model Y=Xβ+ε with asparse coefficient vector β, it is appropriate to discuss therelationships of the iterative communication decoder here with otheriterative algorithms for statistical signal recovery. Though somesimilarities are here-below described, an important distinction is thatpreviously obtained constrained least squares coefficient estimatorshave not been developed for communication at rates near capacity for theGaussian channel.

A class of algorithms for seeking fits of the form Xβ to an observedresponse vector Y are those designed for the task of finding the leastsquares convex projection. This projection can either be to the convexhull of the columns of the dictionary X or, for the present problem, tothe convex hull of the sums of columns, one from each section in thepartitioned case. The relaxed greedy algorithm is an iterative procedurethat solves such problems, applying previous theory by Lee Jones (1992),Barron (1993), Lee, Bartlett and Williamson (1996), or Barron, Cohen, etall (2007). Each pass of the relaxed greedy algorithm is analogous tothe steps of decoding algorithm developed herein, though with animportant distinction. This convex projection algorithm finds in eachsection the term of highest inner product with the residuals from theprevious iteration and then uses it to update the convex combination.Accordingly, like the adaptive decoder here, this algorithm hascomputation resource requirements linear in the product of the size ofthe dictionary and the number iterations. The cited theory bounds theaccuracy of the fit to the projection as a function of the number ofiterations.

The distinction is that convex projection seeks convex combinations ofvertices, whereas the decoding problem here can be regarded as seekingthe best vertex (or a vertex that agrees with it in most sections). Bothalgorithms embody a type of relaxation. The Jones-style relaxation isvia the convex combination down-weighting the previous fits. Theadaptive successive decoder instead achieves relaxation by leavingsections un-decoded on a step if the inner product is not yet abovethreshold. At any given step the section fit is restricted to be avertex or 0.

The inventors have conducted additional analysis of convex projection inthe case of equal power allocation in each section. An approximation tothe projection can be characterized which has largest weight in mostsections at the term sent, when the rate R is less than R₀; whereas forlarger rates the weights of the projection are too spread across theterms in the sections to identify what was sent. To get to the higherrates, up to capacity, one cannot use such convex projection alone.Variable section power may be necessary in the context of suchalgorithms. It advantageous to conduct a more structured iterativedecoding, which is more explicitly targeted to finding vertices, aspresented here.

The conclusions concerning communication rate may also be expressed inthe language of sparse signal recovery and compressed sensing. A numberof terms selected from a dictionary is linearly combined and subject tonoise. Suppose a value B is specified for the ratio of the number ofvariables divided by the number of terms. For signals of the form Xβwith β satisfying the design stipulations used here. Recovery of theseterms from the received noisy Y of length n is possible provided thenumber of terms L satisfies L≧Rn/log B. Equivalently the number ofobservations sufficient to determine L terms satisfies n≦(1/R)L log B.In this signal recovery story, the factor 1/R is not arising asreciprocal of rate, but rather as the constant multiplying L log N/Ldetermining the sample size requirement.

Our results interpreted in this context show for practical recovery thatthere is the R<R₀ limitation in the equal power allocation case. Forother power allocations designed here, recovery by other means ispossible at higher R up to the capacity

. Thus our practical solution to the communications capacity problemprovides also practical solution to the analogous signal recoveryproblem as well as demonstration of the best constant for signalrecovery for a certain behavior of the non-zero coefficients.

These conclusions complement work on sparse signal recovery byWainwright (IEEE IT 2009a,b), Fletcher, Rangan, Goyal (IEEE IT 2009),Donoho, Elad, Temlyakov (IEEE IT 2006), Candes and Palm (Ann. Statist.2009), Tropp (IEEE IT 2006), and Tong Zhang. In summary, their workshows that for reliable determination of L terms from noisymeasurements, having the number of such measurements n be of order L logB is sufficient, and is achieved by various estimator (including convexoptimization with an l_(l) control on the coefficients as in Wainwrightand a forward stepwise regression algorithm in Zhang analogous to thegreedy algorithms discussed above). There results for signal recovery,when translated into the setting of communications, yield reliablecommunications with positive rate, by not allowance for rates up tocapacity. Wainwright (IEEE IT 2009a,b) makes repeated use ofinformation-theoretic techniques including the connection with channelcoding, allowing use of Fano's inequality to give converse-like boundson sparse signal recovery. His work shows, for the designs he permits,that l_(l) constrained convex optimization does not perform as well asthe information-theoretic limits. As said above, the work herein takesit further, identifying the rates achieved in the constant powerallocation case, as well as identifying practical strategies that doachieve up to the information-theoretic capacity, for specify variablepower allocations.

The ideas of superposition codes, rate splitting, and successivedecoding for Gaussian noise channels began with Cover (IEEE IT 1972) inthe context of multiple-user channels. In that setting what is sent is asum of codewords, one for each message. Instead the inventors herein areputting that idea to use for the original Shannon single-user problem.The purpose here of computational feasibility is different from theoriginal multi-user purpose which was characterization of the set ofachievable rates. The ideas of rate splitting and successive decodingoriginating in Cover for Gaussian broadcast channels were laterdeveloped also for Gaussian multiple-access channels, where in the rateregion characterizations of Rimoldi and Urbanke (IEEE IT 2001) and ofCao and Yeh (IEEE IT 2007) rate splitting is in some cases applied toindividual users. For instance with equal size rate splits there are2^(nR/L) choices of code pieces, corresponding to the sections of thedictionary as used here.

So the applicability of superposition of rate split codes for a singleuser channel has been noted, albeit the rate splitting in thetraditional information theory designs have exponential size 2^(nR/L) togain reliability. Feasibility for such channels has been lacking in theabsence of demonstration of reliability at high rate with superpositionsfrom polynomial size dictionaries. In contrast success herein is builton the use of sufficiently many pieces (sections) with L of order n towithin log factors such the section sizes B=2^(nR/L) become moderate(e.g. also of order n). Now with such moderate B one can not assurereliability of direct successive decoding. As said, to overcome thatdifficulty, adaptation rather than pre-specification of the set ofsections decoded each step is key to the reliability and speed of thedecoder invented here.

It is an attractive feature of the superposition based solution obtainedherein for the single-user channel that it is amenable to extension topractical solution of the corresponding multi-user channels, namely, theGaussian multiple access and Gaussian broadcast channel.

Accordingly, the invention is understood to include those aspects ofpractical solution of Gaussian noise broadcast channels andmultiple-access channels that arise directly from combining thesingle-user style adaptive successive decoder analysis here with thetraditional multi-user rate splitting steps.

Outline of Manuscript:

After some preliminaries, section 3 describes the decoder. In Section 4the distributions of the various test statistics associated with thedecoder are analyzed. In particular, the inner product test statisticsare shown to decompose into normal random variables plus a nearlyconstant random shift for the terms sent. Section 5 demonstrates theincrease for each step of the mean separation between the statistics forterms sent and terms not sent. Section 6 sets target detection and alarmrates. Reliability of the algorithm is established in section 7, withdemonstration of exponentially small error probabilities. Computationalillustration is provided in section 8. A requirement of the theory isthat the decoder satisfies a property of accumulation of correctdetections. Whether the decoder is accumulative depends on the rate andthe power allocation scheme. Specialization of the theory to aparticular variable power allocation scheme is presented in section 9.The closeness to capacity is evaluated in section 10. Lower bounds onthe error exponent are in section 11. Refinements of closeness tocapacity are in section 12. Section 13 discusses the use of an outerReed-Solomon code to correct any mistakes from the inner decoder. Theappendix collects some auxiliary matters.

2 Some Preliminaries

Notation:

For vectors a, b of length n, let ∥a∥² be the sum_of squares ofcoordinates, let |a|²=(1/n)Σ_(i=1) ^(n)a_(i) ² be the average square andlet respectively a^(T)b and a·b=(1/n)Σ_(i=1) ^(n)a_(i)b_(i) be theassociated inner products. It is more convenient to work with |a| anda·b.

Setting of Analysis:

The dictionary is randomly generated. For the purpose of analysis ofaverage probability of error or average probability of at least certainfraction of mistakes, properties are investigated with respect to thejoint distribution of the dictionary and the noise.

The noise ε and the X_(j) in the dictionary are jointly independentnormal random vectors, each of length n, with mean equal to the zerovector and covariance matrixes equal to σ²I and I, respectively. Thesevectors have n coordinates indexed by i=1, 2, . . . , n which may becalled the time index. Meanwhile J is the set of term indices jcorresponding to the columns of the dictionary, which may be organizedas a union of sections. The codeword sent is from a selection of Lterms. The cardinality of J is N and the ratio B=N/L.

Corresponding to an input, let sent={j₁,j₂, . . . , j_(L)} be theindices of the terms sent and let other=J−sent be the set of indices ofall other terms in the dictionary. Component powers P_(j) are specified,such that Σ_(j sent)P_(j)=P. The simplest setting is to arrange thesecomponent powers to be equal P_(j)=P/L. Though for best performance,there will be a role for component powers that are different indifferent portions of the dictionary. The coefficients for the codewordsent are β_(j)=√{square root over (P_(j))}1_(j sent). The receivedvector is

$Y = {{\sum\limits_{j}^{\;}\;{\beta_{j}X_{j}}} + {ɛ.}}$

Accordingly, X_(j) and Y are joint normal random vectors, with expectedproduct between coordinates and hence expected inner product

[X_(j)·Y] equal to β_(j). This expected inner product has magnitude√{square root over (P_(j))} for the terms sent and 0 for the terms notsent. So the statistics X_(j)·Y are a source of discrimination betweenthe terms.

Note that each coordinate of Y has expected square σ_(Y) ²=P+σ² andhence

[|Y|²]=P+σ².

Exponential Bounds for Relative Frequencies:

In the distributional analysis repeated use is made of simple largedeviations inequalities. In particular, if {circumflex over (q)} is therelative frequency of occurrence of L independent events with successprobability q*, then for q<q* the probability of the event {{circumflexover (q)}<q} is not more than the Sanov-Csiszàr bound e^(−LD(q∥q*)),where the exponent D(q∥q*)=D_(Ber)(q∥q*) is the relative entropy betweenBernoulli distributions. This information-theoretic bound subsumes theHoeffding bound e^(−2(q*−q)) ² ^(L) via the Csiszàr-Kullback inequalitythat D exceeds twice the square of total variation, which here is,D≧2(q*−q)². An extension of the information-theoretic bound to coverweighted combinations of indicators of independent events is in Lemma 46in the appendix and slight dependence among the events is addressedthrough bounds on the joint distribution. The role of {circumflex over(q)} is played by weighted counts for j in sent of test statistics beingabove threshold.

In the same manner, one has that if {circumflex over (p)} is therelative frequency of occurrence of independent events with successprobability p*, then for p>p* the probability of the event {{circumflexover (p)}>p} has a large deviation bound with exponent D_(Ber)(p∥p*). Inthe use here of such bounds, the role of {circumflex over (p)} is playedby the relative frequency of false alarms, based on occurrences of j inother of test statistics being above threshold. Naturally, in this case,both p and p* are arranged to be small, with some control on the ratiobetween them. It is convenient to make use of lower bounds onD_(Ber)(p∥p*), as detailed in Lemma 47 in the appendix, which includewhat may be called the Poisson bound p log p/p*+p*−p and the Bellingerbound 2(√{square root over (p)}−√{square root over (p*)})², both ofwhich exceed (p−p*)²/(2p). All three of these lower bounds are superiorto the variation bound 2(p−p*)² when p is small.

3 The Decoder

From the received Y and knowledge of the dictionary, decode which termswere sent by an iterative procedure now specified more fully.

The first step is as follows. For each term X_(j) of the dictionarycompute the inner product with the received string X_(j) ^(T)Y as a teststatistic and see if it exceeds a threshold T=∥Y∥_(τ). Denote theassociated event

_(j) ={X _(j) ^(T) Y≧T}.In terms of a normalized test statistic this first step test is the sameas comparing

_(1,j) to a threshold τ, where

_(1,j) =X _(j) ^(T) Y/∥Y∥,the distribution of which will be shown to be that of a standard normalplus a shift by a nearly constant amount, where the presence of theshift depends on whether j is one of the terms sent. Thus

_(j)={

_(1,j)≧τ}. The threshold is chosen to beτ=√{square root over (2 log B)}+a.The idea of the threshold on the first step is that very few of theterms not sent will be above threshold. Yet a positive fraction of theterms sent, determined by the size of the shift, will be above thresholdand hence will be correctly decoded on this first step.

Let thresh₁={jεJ:

=1} be the set of terms with the test statistic above threshold and letabove₁ denote the fraction of such terms. In the variable power case itis a weighted fraction above₁=Σ_(jεthresh) ₁ P_(j)/P, weighted by thepower P_(j). The strategy is to restrict decoding on the first step toterms in thresh₁ so as to avoid false alarms. The decoded set is eithertaken to be dec₁=thresh₁ or, more generally, a value pace₁ is specifiedand, considering the terms in J in order of decreasing

_(1,j), include in dec₁ as many as can be with Σ_(jεdec) ₁ π_(j) notmore than min{pace₁, above₁}. Let DEC₁ denote the cardinality of the setdec₁.

The output of the first step consists of the set of decoded terms dec₁and the vector F₁=Σ_(jεdec) ₁ √{square root over (P_(j))}X_(j) whichforms the first part of the fit. The set of terms investigated in step 1is J₁=J, the set of all columns of the dictionary. Then the setJ₂=J₁−dec₁ remains for second step consideration. In the extremelyunlikely event that DEC₁ is already at least L there will be no need forthe second step.

A natural way to conduct subsequent steps would be as follows. For thesecond step compute the residual vectorr ₂ =Y−F ₁.For each of the remaining terms, i.e. terms in J₂, compute the innerproduct with the vector of residuals, that is, X_(j) ^(T)r₂ or itsnormalized form

_(j) ^(r)−X_(j) ^(T)r₂/∥r₂∥ which may be compared to the same thresholdτ=√{square root over (2 log B)}+a, leading to a set dec₂ of decodedterms for the second step. Then compute F₂=Σ_(jεdec) ₂ √{square rootover (P_(j))}X_(j), the fit vector for the second step.

The third and subsequent steps would proceed in the same manner as thesecond step. For any step k, one computes the residual vectorr _(k) =Y−(F ₁ + . . . +F _(k−1)).For terms in J_(k)=J_(k−1)−dec_(k−1), one gets thresh_(k) as the set ofterms for which X_(j) ^(T)r_(k)/∥r_(k)∥ is above τ. The set of decodedterms is either taken to be thresh_(k) or a subset of it. The decodingstops when the size of the cardinality of the set of all decoded termbecomes L or there are no terms above threshold in a particular step.3.1 Statistics from Adaptive Orthogonal Components:

A variant of the above algorithm from second step onwards is described,which is found here to be easier to analyze. The idea is that theingredients Y, F₁, . . . , F_(k−1) previously used in forming theresiduals may be decomposed into orthogonal components and teststatistics formed that entail the best combinations of inner productswith these components.

In particular, for the second step the vector G₂ is formed, which is thepart of F₁ orthogonal to G₁=Y. For j in J₂, the statistic

_(2,j)=X_(j) ^(T)G₂/∥G₂∥ is computed as well as the combined statistic

_(2,j) ^(comb)=√{square root over (λ₁)}

_(1,j)−√{square root over (λ₂)}

_(2,j), where λ₁=1−λ and λ₂=λ, with a value of λ to be specified. Whatis different on the second step is that now the events

_(2,j)={

_(2,j) ^(comb)≧τ} are based on these

_(2,j) ^(comb), which are inner products of X_(j) with the normalizedvector E₂=√{square root over (λ₁)}Y/∥Y∥−√{square root over (λ₂)}G₂/∥G₂∥.To motivate these statistics note the residuals r₂=Y−F₁ may be writtenas (1−{circumflex over (b)}₁)Y−G₂ where {circumflex over (b)}₁=F₁^(T)Y/∥Y∥². The statistic used in this variant may be viewed asapproximations to the corresponding statistics based on the normalizedresiduals r₂/∥r₂∥, except that the form of λ and the analysis aresimplified.

Again these test statistics

_(2,j) ^(comb) lead to the set thresh₂={jεJ₂:

=1} of size above₂=Σ_(jεthresh) ₂ π_(j). Considering these statistics inorder of decreasing value, it leads to the set dec₂ consisting of asmany of these as can be while maintaining accept₂≦min{pace₂, above₂},where accept₂=Σ_(jεdec) ₂ π_(j). This provides an additional part of thefit F₂=Σ_(jεdec) ₂ √{square root over (P_(j))}X_(j).

Proceed in this manner, iteratively, to perform the following loop ofcalculations, for k≧2. From the output of step k−1, there is availablethe vector F_(k−1), which is a part of the fit, and for k′<k there arepreviously stored vectors G_(k′) and statistics

_(k′,j). Plus there is a set dec_(1,k−1)=dec₁∪ . . . ∪ dec_(k−1) alreadydecoded on some previous step and a set J_(k)=J−dec_(1,k−1) of terms foris to test at step k. Consider, as discussed further below, the partG_(k) of F_(k−1) orthogonal to the previous G_(k′) and for each j not indec_(k−1) compute

_(k,j) =X _(j) ^(T) G _(k) /∥G _(k)∥and the combined statistic

_(k,j) ^(comb)=√{square root over (λ_(1,k))}

_(1,j)−√{square root over (λ_(2,k))}

_(2,j)− . . . −√{square root over (λ_(k,k))}

_(k,j),where these λ will be specified with Σ_(k′=1) ^(k)λ_(k′,k)=1. Thesepositive weights will take the form λ_(k′,k)=w_(k′)/s_(k), with w₁=1,and s_(k)=1+w₂+ . . . w_(k), with w_(k) to be specified. Accordingly,the combined statistic may be computed by the update comb

_(k,j) ^(comb)=√{square root over (1−λ_(k))}

_(k−1,j) ^(comb)−√{square root over (λ_(k))}

_(k,j),where λ_(k)=w_(k)/s_(k). This statistic may be thought of as the innerproduct of X_(j) with a vector updated as E_(k)=√{square root over(1−λ_(k))}E_(k−1)−√{square root over (λ_(k))}G_(k)/∥G_(k)∥, serving as asurrogate for r_(k)/∥r_(k)∥. For terms j in J_(k) these statistics

_(k,j) ^(comb) are compared to a threshold, leading to the events

_(k,j)={

_(k,j) ^(comb)≧τ}.The idea of these steps is that, as quantified by an analysis of thedistribution of the statistics

_(k,j), there is an increasing separation between the distribution forterms j sent and the others.

Let thresh_(k)={jεJ_(k):

_(k,j) ^(comb)≧τ_(k)} and above_(k)=Σ_(jεthresh) _(k)π_(j) and for aspecified pace_(k), considering these test statistics in order ofdecreasing value, include in dec_(k) as many as can be withaccept_(k)≦min{pace_(k), above_(k)}, where accept_(k)=Σ_(jεdec) _(k)π_(j). The output of step k is the vector

$F_{k} = {\sum\limits_{j \in {dec}_{k}}^{\;}{\sqrt{P_{j}}{X_{j}.}}}$Also the vector G_(k) and the statistics

_(k,j) are appended to what was previously stored, for all terms not inthe decoded set. From this step update is provided to the set of decodedterms dec_(1,k)=dec_(k−1)∪dec_(k) and the set J_(k+1)=J_(k)−dec_(k) ofterms remaining for consideration.

This completes the actions of step k of the loop.

To complete the description of the decoder, the values of w_(k) thatdetermine the λ_(k) will need to be specified and likewise pace_(k) isto be specify. For these specifications there will be a role formeasures of the accumulated size of the detection set accept_(k)^(tot)=Σ_(k′=1) ^(k) accept_(k′) as well a target lower bound q_(1,k) onthe total weighted fraction of correct detection (the definition ofwhich arises in a later section), and an adjustment to it given byq_(1,k) ^(adj)=q_(1,k)/(1+f_(1,k)/q_(1,k)) where f_(1,k) is a targetupper bound on the total weighted fraction of false alarms. The choicesconsidered here take w_(k)=s_(k)−s_(k−1) to be increments of thesequence s_(k)=1/(1−x_(k−1)ν) that arises in characterizing the abovementioned separation. In the definition of w_(k) the x_(k−1) is taken tobe as either accept_(k−1) ^(tot) or q_(1,k−1) ^(adj), both of whicharise as surrogates to a corresponding unobservable quantity which wouldrequire knowledge of the actual fraction of correct detection throughstep k−1.

There are two options for pace_(k) that are described. First, one mayarrange for dec_(k) to be all of thresh_(k) by setting pace_(k)=1, largeenough that it has essentially no role, and with this option the w_(k)is set as above using x_(k−1)=accept_(k−1) ^(tot). This choice yields asuccessful growth of the total weighted fractions of correct detections,though to handle the empirical character of w_(k) there is a slight costto it in the reliability bound, not present with the second option.

For the second option, let pace_(k)=g_(1,k) ^(adj)−q_(1,k−1) ^(adj) bethe deterministic increments of the increasing sequence q_(1,k) ^(adj),with which it is shown that above_(k) is likely to exceed pace_(k), foreach k. When it does then accept_(k) equals the value pace_(k), andcumulatively their sum accept_(k) ^(tot) matches the target q_(1,k)^(adj). Likewise, for this option, w_(k), is set using x_(k−1)=q_(1,k−1)^(adj). It's deterministic trajectory facilitates the demonstration ofreliability of the decoder.

On each step k the decoder uncovers a substantial part of what remains,because of growth of the mean separation between terms sent and theothers, as shall be seen.

The algorithm stops under the following conditions. Natural practicalconditions are that L terms have been decoded, or that the weightedtotal size of the decoded set accept_(k) ^(tot) has reached at least 1,or that no terms from J_(k) are found to have statistic above threshold,so that F_(k) is zero and the statistics would remain thereafterunchanged. An analytical condition is the lower bound that will beobtained on the likely mean separation stops growing (captured throughq_(1,k) ^(adj) no longer increasing), so that no further improvement istheoretically demonstrable by such methodology. Subject to rateconstraints near capacity, the best bounds obtained here occur with atotal number of steps m equal to an integer part of 2+snr log B.

Up to step k, the total set of decoded terms is dec_(1,k), and thecorresponding fit fit_(k) may be represented either as Σ_(jεdec) _(1,k)√{square root over (P_(j))}X_(j) or as the sum of the pieces from eachstepfit _(k) =F ₁ +F ₂ + . . . +F _(k).

As to the part G_(k) of F_(k−1) orthogonal to G_(k′) for k′<k, takeadvantage of two ways to view it, one emphasizing computation and theother analysis.

For computation, work directly with parts of the fit. The G₁, G₂, . . ., G_(k−1) are orthogonal vectors, so the parts of F_(k−1) in thesedirections are {circumflex over (b)}_(k,k′)G_(k′) with coefficients{circumflex over (b)}_(k,k′)=F_(k−1) ^(T)G_(k′)/∥G_(k′)∥² for k′=1, 2, .. . , k−1, where if peculiarly ∥G/k′∥=0 use {circumflex over(b)}_(k,k′)=0. Accordingly, the new G_(k) may be computed from F_(k−1)and the previous G_(k′) with k′/<k by

$G_{k} = {F_{k - 1} - {\sum\limits_{k^{\prime} = 1}^{k - 1}\;{{\hat{b}}_{k,k^{\prime}}{G_{k^{\prime}}.}}}}$This computation entails the n-fold sums of products F_(k) ^(T)G_(k′)for determination of the {circumflex over (b)}_(k,k′). Then from thiscomputed G_(k) obtain the inner products with the X_(j) to yield

_(k,j)=X_(j) ^(T)G_(k)/∥G_(k)∥ for j in J_(k).

The algorithm is seen to perform an adaptive Gram-Schmidtorthogonalization, creating orthogonal vectors G_(k) used inrepresentation of the X_(j) and linear combinations of them, indirections suitable for extracting statistics of appropriatediscriminatory power, starting from the received Y. For the classicalGram-Schmidt process, one has a pre-specified set of vectors which aresuccessively orthogonalized, at each step, by finding the part of thecurrent vector that is orthogonal to the previous vectors. Here instead,for each step, the vector F_(k−1), for which one finds the part G_(k)orthogonal to the vectors G₁, . . . , G_(k−1), is not pre-specified.Rather, it arises from thresholding statistics extracted in creatingthese vectors.

For analysis, look at what happens to the representation of theindividual terms. Each term X_(j) for jεJ_(k−1) has the decomposition

${X_{j} = {{{\mathfrak{Z}}_{1,j}\frac{G_{1}}{G_{1}}} + {{\mathfrak{Z}}_{2,j}\frac{G_{2}}{G_{2}}} + \ldots + {{\mathfrak{Z}}_{{k - 1},j}\frac{G_{k - 1}}{G_{k - 1}}} + V_{k,j}}},$where V_(k,j) is the part of X_(j) orthogonal to G₁, G₂, . . . ,G_(k−1). Since

$F_{k - 1} = {\sum\limits_{j \in {dec}_{k - 1}}^{\;}{\sqrt{P_{j}}X_{j}}}$it follows that G_(k) has the representation

${G_{k} = {\sum\limits_{j \in {dec}_{k - 1}}^{\;}\;{\sqrt{P_{j}}V_{k,j}}}},$from which

_(k,j)=V_(k,j) ^(T)G_(k)/∥G_(k)∥, and one has the updated representation

$X_{j} = {{{\mathfrak{Z}}_{1,j}\frac{G_{1}}{G_{1}}} + \ldots + {{\mathfrak{Z}}_{{k - 1},j}\frac{G_{k - 1}}{G_{k - 1}}} + {{\mathfrak{Z}}_{k,j}\frac{G_{k}}{G_{k}}} + {V_{{k + 1},j}.}}$With the initialization V_(0,j)=X_(j), these V_(k+1,j) may be thought ofas iteratively obtained from the corresponding vectors at the previousstep, that is,V _(k+1,j) V _(k,j)−

_(k,j) G _(k) /∥G _(k)∥.These V do not actually need to be computed, nor do its componentsdetailed below, but this representation of the terms X_(j) is used inobtaining distributional properties of the

_(k,j).3.2 The Weighted Fractions of Detections and Alarms:

The weights π_(j)=P_(j)/P sum to 1 across in sent and they sum to B−1across j in other. Define in general

${\hat{q}}_{k} = {\sum\limits_{j \in {{sent}\bigcap{dec}_{k}}}^{\;}\pi_{j}}$for the step k correct detections and

${\hat{f}}_{k} = {\sum\limits_{j \in {{other}\bigcap{dec}_{k}}}^{\;}\pi_{j}}$for the false alarms. In the case P_(j)=P/L which assigns equal weightπ_(j)=1/L, then {circumflex over (q)}_(k) L is the increment to thenumber of correct detections on step k, likewise {circumflex over(f)}_(k) L is the increment to the number of false alarms. Their sumaccept_(k)={circumflex over (q)}_(k)+{circumflex over (f)}_(k) matchesΣ_(jεdec) _(k) π_(j).

The total weighted fraction of correct detections up to step k is{circumflex over (q)}_(k) ^(tot)=Σ_(jεsent∩dec) _(1,k) π_(j) which maybe written as the sum{circumflex over (q)} _(k) ^(tot) ={circumflex over (q)} ₁ +{circumflexover (q)} ₂+ . . . +{circumflex over (q)}_(k).Assume for now that dec_(k)=thresh_(k). Then these increments{circumflex over (q)}_(k) equal Σ_(jεsentΩJ) _(k) π_(j)

The decoder only encounters these

_(k,j)={

_(k,j) ^(comb)>τ} for j not decoded on previous steps, i.e., for j inJ_(k)=(dec_(1,k−1))^(c). For each step k, one may define the statisticsarbitrarily for j in dec_(1,k−1), so as to fill out definition of theevents

_(k,j) for each j, in a manner convenient for analysis. By induction onk, on sees that dec_(1,k) consists of the terms j for which the unionevent

_(1,j)∪ . . . ∪

_(k,j) occurs. Because if dec_(1,k−1)={j:

=1} then the decoded set dec_(1,k) consists of terms for which either

_(1,j)∪ . . . ∪

_(k−1,j) occurs (previously decoded) or

_(k,j)∪[

_(1,j)∪ . . . ∪

_(k−1,j)]^(c) occurs (newly decoded), and together these eventsconstitute the union

_(1,j)∪ . . . ∪

_(k,j).

Accordingly, the total weighted fraction of correct detection{circumflex over (q)}_(k) ^(tot) may be regarded as the same as the πweighted measure of the union

${\hat{q}}_{k}^{tot} = {\sum\limits_{j\mspace{14mu}{sent}}^{\;}{\pi_{j}{1_{\{{\mathcal{H}_{1,j}\bigcup\mspace{14mu}\ldots\mspace{14mu}\bigcup\mathcal{H}_{k,j}}\}}.}}}$Indeed, to relate this expression to the preceding expression for thestun for {circumflex over (q)}_(k) ^(tot), the sum for k′ from 1 to kcorresponds to the representation of the union as the disjoint union ofcontributions from terms sent that are in

_(k′,j) but not in earlier such events.

Likewise the weighted count of false alarms {circumflex over (f)}_(k)^(tot)=Σ_(jεother∪dec) _(1,k) π_(j) may be written as{circumflex over (f)} _(k) ^(tot) ={circumflex over (f)} ₁ +{circumflexover (f)} ₂+ . . . +{circumflex over (f)}_(k)which when dec_(k)=thresh_(k) may be expressed as

${\hat{f}}_{k}^{tot} = {\sum\limits_{j\mspace{14mu}{other}}^{\;}{\pi_{j}{1_{\{{\mathcal{H}_{1,j}\bigcup\mspace{14mu}\ldots\mspace{14mu}\bigcup\mathcal{H}_{k,j}}\}}.}}}$

In the distributional analysis that follows the mean separation is shownto be given by an expression inversely related to 1−{circumflex over(q)}_(k−1) ^(tot)ν. The idea of the multi-step algorithm is toaccumulate enough correct detections in {circumflex over (q)}_(k)^(tot), with an attendant low number of false alarms, that the fractionthat remains becomes small enough, and the mean separation hence pushedlarge enough, that most of what remains is reliably decoded on the laststep.

The analysis will provide, for each section l, lower bounds on theprobability that the correct term is above threshold by step k andupper-bounds on the accumulated false alarms. When the snr is low and aconstant power allocation is used, these probabilities are the sameacross the sections, all of which remain active for consideration untilcompletion of the steps.

For variable power allocation, with P_((l)) decreasing in l, then foreach step k, the probability that the correct term is above thresholdvaries with l. Nevertheless, it can be a rather large number of sectionsfor which this probability takes an intermediate value (neither smallnor close to one), thereby necessitating the adaptive decoding. Most ofthe analysis here proceeds by allowing at each step for terms to bedetected from any section l=1, 2, . . . , L.

3.3 An Optional Analysis Window:

For large C, the P_((l)) proportional to e^(−2Cl/L) exhibits a strongdecay with increasing l. Then it can be appropriate to take advantage ofa deterministic decomposition into three sets of sections at any givennumber of steps. There is the set of sections with small l, which calledpolished, where the probability of the correct term above thresholdbefore step k is already sufficiently close to one that it is known inadvance that it will not be necessary to continue to check these (as thesubsequent false alarm probability would be quantified as larger thanthe small remaining improvement to correct detection probability forthat section). Let polished_(k) (initially empty) be the set of terms inthese sections. With the power decreasing, this coincides with anon-decreasing initial interval of sections.

Likewise there are the sections with large l where the probability of acorrect detection on step k is less than the probability of false alarm,so it would be advantageous to still leave them untested. Letuntested_(k) (desirably eventually empty) be the set of terms from thesesections, corresponding to a decreasing tail interval of sections up tothe last section L.

The complement is a middle region of termspotential_(k) =J−polished_(k)−untested_(k),corresponding to a window of sections, left_(k)≦l≦right_(k), worthy ofattention in analyzing the performance at step k. For each term in thisanalysis window there is a reasonable chance (neither too high nor toolow) of it being decoded by the completion of this step.

These middle regions overlap across k, so that for any term j haspotential for being decoded in several steps.

In any particular realization of X, Y, some terms in this setpotential_(k) are already in dec_(1,k−1). Accordingly, one has theoption at step k to restrict the active set of the search toJ_(k)=potential_(k)∪dec_(1,k−1) ^(c) rather than searching all of theset dec_(1,k−1) ^(c) not previously decoded. In this case one modifiesthe definitions of {circumflex over (q)}_(k) ^(tot) and {circumflex over(f)}_(k) ^(tot), to be

${\hat{q}}_{k}^{tot} = {\sum\limits_{j\mspace{14mu}{sent}}^{\;}{\pi_{j}1_{\{{\bigcup_{k^{\prime} \in K_{j,k}}\mathcal{H}_{k^{\prime},j}}\}}}}$and${\hat{f}}_{k}^{tot} = {\sum\limits_{j\mspace{14mu}{other}}^{\;}{\pi_{j}1_{\{{\bigcup_{k^{\prime} \in K_{j,k}}\mathcal{H}_{k^{\prime},j}}\}}}}$where K_(j, k) = {k^(′) ≤ k:  j ∈ potential_(k^(′))}.This refinement allows for analysis to show reduction in the total falsealarms and corresponding improvement to the rate drop from capacity,when C is large.

4 Distributional Analysis

In this section the distributional properties of the random variables

_(k)=(

_(k,j): jεJ_(k)) for each k=1, 2, . . . , n are described. In particularit is shown for each k that

_(k,j) are location shifted normal random variables with variance nearone for jεsent∪J_(k) and are independent standard normal randomvariables for jεother∪J_(k).

In Lemma 1 below the distributional properties of

₁ are derived. Lemma 2 characterizes the distribution of

_(k) for steps k≧2.

Before providing these lemmas a few quantities are defined which will behelpful in studying the location shifts of

_(k,j) for jεsent∪J_(k). In particular, define the quantityC _(j,R)=π_(j) Lν/(2R),where π_(j)=P_(j)/P and ν=ν₁=P/(σ²+P). Likewise defineC _(j,R,B)=(C _(j,R))2 log B,which also has the representationC _(j,R,B) =nπjν.The role of this quantity as developed below is via the location shift√{square root over (C_(j,R,B))} seen to be near √{square root over(C_(j,R))}τ. One compares this value to τ, that is, one compares C_(j,R)to 1 to see when there is a reasonable probability of some correctdetections starting at step 1, and one arranges C_(j,R) to taper not toorapidly to allow decodings to accumulate on successive steps.

Recalling that π_(j)=π_((l))=P_((l))/P for j in section l, also denotethe quantities defined above asC _(l,R)=π_((l)) Lν(2R)and c_(l,R,B)=C_(l,R)(2 log B) which is nπ_((l))ν.

Here are two illustrative cases. For the constant power allocation case,π_((l)) equals 1/L and C_(l,R) reduces toC _(l,R) =R ₀ /R,where R₀=(½)P/(σ²+P). This C_(l,R) is at least 1 when the rate R is notmore than R₀.

For the case of power P_((l)) proportional to

, the value becomes π_((l))=

,

for l from 1 to L. Define

=(L/2)[1−

],which is essentially identical to

, for L large compared to C. Thenπ_((l))=(2

/Lν)

andC _(l,R)=(

/R)

.For rates R not more than

, this C_(l,R) is at least 1 in some sections, leading to likelihood ofsome initial successes, and it tapers at the fastest rate at whichdecoding successes can still accumulate.4.1 Distributional Analysis of the First Step:

The lemma for the distribution of

₁ is now given. Recall that J₁=J is the set of all N indices.

Lemma 1.

For each jεJ₁, the statistic

_(1,j) can be represented as√{square root over (C _(j,R,B))}[χ_(n)/√{square root over(n)}]1_(j sent) +Z _(1,j),where Z₁=(Z_(1,j): jεJ₁) is multivariate normal N(0,Σ₁) and χ_(n)²=∥Y∥²/σ_(Y) ² is a Chi-square (n) random variable that is independentof Z₁. Here recall that σ_(Y) ²=P+σ² is the variance of each coordinateof Y.

The covariance matrix Σ₁ can be expressed as Σ₁=I−b₁b₁ ^(T), where b₁ isthe vector with entries b_(1,j)=β_(j)/σ_(Y) for j in J.

The subscript 1 on the matrix Σ₁ and the vector b₁ are to distinguishthese first step quantities from those that arise on subsequent steps.

Demonstration of Lemma 1:

Recall that the X_(j) for j in J are independent N(0, I) random vectorsand that Y=Σ_(j)β_(j)X_(j)+ε, where the stun of squares of the β_(j) isequal to P.

Consider the decomposition of each random vector X_(j) of the dictionaryinto a vector in the direction of the received Y and a vector U_(j)uncorrelated with Y. That is, one considers the reverse regressionX _(j) =b _(1,j) Y/σ _(Y) +U _(j),where the coefficient is b_(1,j)=

[X_(i,j)Y_(i)]/σ_(Y)=β_(j)/σ_(Y), which indeed makes each coordinate ofU_(j) uncorrelated with each coordinate of Y. These coefficients collectinto a vector b₁=β/σ_(Y) in

^(N).

These vectors U_(j)=X_(j)−b_(1,j)Y/σ_(Y) along with Y are linearcombinations of joint normal random variables and so are also jointnormal, with zero correlation implying that Y is independent of thecollection of U_(j). The independence of Y and U_(j) facilitatesdevelopment of distributional properties of the U_(j) ^(T)Y. For thesepurposes obtain the characteristics of the joint distribution of theU_(j) across terms j (clearly there is independence for distinct timeindices i).

The coordinates of U_(j) and U_(j′) have mean zero and expected product1_({j=j′})−b_(1,j)b_(1,j′). These covariances (

[U_(i,j)U_(j,j′)]: j,j′εJ) organize into a matrixΣ₁ =Σ=I−Δ=I−bb ^(T).

For any constant vector α≠0, consider U_(j) ^(T)α/∥α∥. Its joint normaldistribution across terms j is the same for any such α. Specifically, itis a normal N(0,Σ), with mean zero and the indicated covariances.

Likewise define the random variables Z_(j)=U_(j) ^(T)Y/∥Y∥, also denotedZ_(1,j) when making explicit that it is for the first step. Jointlyacross j, these Z_(j) have the normal N(0,Σ) distribution, independentof Y. Indeed, since the U_(j) are independent of Y, when conditioned onY=α one gets the same N(0,ρ) distribution, and since this conditionaldistribution does not depend on Y, it is the unconditional distributionas well.

What this leads to is revealed via the representation of the innerproduct N_(j) ^(T)Y as b_(1,j)∥Y∥²/σ_(Y)+U_(j) ^(T)Y, which can bewritten as

${X_{j}^{T}Y} = {{\beta_{j}\frac{{Y}^{2}}{\sigma_{Y}^{2}}} + {{Y}{Z_{j}.}}}$This identifies the distribution of the X_(j) ^(T)Y as that obtained asa mixture of the normal Z_(j) with scale and location shifts determinedby an independent random variable χ_(n) ²=∥Y∥²/σ_(Y) ², distributed asChi-square with n degrees of freedom.

Divide through by ∥Y∥ to normalize these inner products to a helpfulscale and to simplify the distribution of the result to be only that ofa location mixture of normals. The resulting random variables

_(1,j)=X_(j) ^(T)Y/∥Y∥ take the form

_(1,j) =√{square root over (n)}b _(1,j) |Y|/σ _(Y) +Z _(j),where |Y|/σ_(Y)=χ_(n)/√{square root over (n)} is near 1. Note that√{square root over (n)}b_(1,j)=√{square root over (n)}β_(j)/σ_(Y) whichis √{square root over (nπ_(j)ν)} or √{square root over (C_(j,R,B))}.This completes the demonstration of Lemma 1.

The above proof used the population reverse regression of X_(j) onto Y,in which the coefficient b_(1,j) arises as a ratio of expected products.There is also a role for the empirical projection decomposition, thefirst step of which is X_(j)=

_(1,j)Y/∥Y∥+V_(2,j), with G₁=Y. Its additional steps provide the basisfor additional distributional analysis.

4.2 Distributional Analysis of Steps k≧2:

Let V_(k,j) be the part of X_(j) orthogonal to G₁, G₂, . . . , G_(k−1),from which G_(k) is obtained as Σ_(jεdec) _(k−1) √{square root over(P_(j))}V_(k,j). It yields the representation of the statistic

_(k,j)=X_(j) ^(T)G_(k)/∥G_(k)∥ as V_(k,j) ^(T)G_(k)/∥G_(k)∥, as said.Amongst other matters, the proof of the following lemma determines, forjεJ_(k), the ingredients of the regressionV_(k,j)=b_(k,j)G_(k)/σ_(k)+U_(k,j) in which U_(k,j) is found to be amean zero normal random vector independent of G_(k), conditioning oncertain statistics from previous steps. Taking the inner product withthe unit vector G_(k)/∥G_(k)∥ yields a representation of

_(k,j) as a mean zero normal random variable Z_(k,j) plus a locationshift that is a multiple of ∥G_(k)∥ depending on whether j is in sent ornot. The definition of Z_(k,j) is U_(k,j) ^(T)G_(k)/∥G_(k)∥.

Here the pattern used in Lemma 1 is maintained, using the calligraphicfont

_(k,j) to denote the test statistics that incorporate the shift for j insent and using the standard font Z_(k,j) to denote their counterpartmean zero normal random variables before the shift.

The lemma below characterizes the sequence of conditional distributionsof the Z_(k)=(Z_(k,j): jεJ_(k)) and ∥G_(k)∥, given

_(k−1), for k=1, 2, . . . n, where

_(k−1)=(∥G _(k′) ∥,Z _(k′) : k′=1, . . . , k−1).This determines also the distribution of

_(k)=(

_(k,j): jεJ_(k)) conditional on

_(k−1). Initializing with the distribution of

₁ derived in Lemma 1, the conditional distributions for all 2≦k≦n, areprovided. The algorithm will be arranged to stop long before n, so theseproperties are needed only up to some much smaller final k=m. Note thatJ_(k) is never empty because at most L are decoded, so there must alwaysbe at least (B−1)L remaining. For an index set which may depend on theconditioning variables, let N_(J) _(k) (0,Σ) denote a mean zeromultivariate normal distribution with index set J_(k) and the indicatedcovariance matrix.

Lemma 2.

For k≧2, given

_(k−1), the conditional distribution

_(Z) _(k,l) _(|)

_(k−1) of Z_(k,J) _(k) =(Z_(k,j): jεJ_(k)) is normal N_(J) _(k)(0,Σ_(k)); the random variable χ_(d) _(k) ²∥G_(k)∥²/σ_(k) ² is aChi-square distributed, with d_(k)=n−k+1 degrees of freedom,conditionally independent of the Z_(k), where σ_(k) ² depends on

_(k−1) and is strictly positive provided there was at least one termabove threshold on step k−1; and, moreover,

_(k,j) has the representation−√{square root over (ŵ_(k) C _(j,R,B))}[χ_(d) _(k) /√{square root over(n)}]1_(j sent) +Z _(k,j).The shift factor ŵ_(k) is the increment ŵ_(k)=ŝ_(k)−ŝ_(k−1), of theseries ŝ_(k) with

${1 + {\hat{w}}_{2} + \ldots + {\hat{w}}_{k}} = {{\hat{s}}_{k} = \frac{1}{1 - {\left( {{\hat{q}}_{1}^{adj} + \ldots + {\hat{q}}_{k - 1}^{adj}} \right)v}}}$where {circumflex over (q)}_(j) ^(adj)={circumflex over(q)}_(j)/(1+{circumflex over (f)}_(j)/{circumflex over (q)}_(j)),determined from weighted fractions of correct detections and falsealarms on previous steps. Here ŝ₁=ŵ₁=1. The ŵ_(k) is strictly positive,that is, ŝ_(k) is increasing, as long as {circumflex over (q)}_(k−1)>0,that is, as long as the preceding step had at least one correct termabove threshold. The covariance Σ_(k) has the representation93 _(k) =I−δ _(k)δ_(k) ^(T) =I−ν _(k)β≈^(T) /Pwhere ν_(k)=ŝ_(k)ν, (Σ_(k))_(j,j′)=1_(j=j′)−δ_(k,j)δ_(k,j′), for j, j′,in J_(k), where the vector δ_(k) is in the direction β, withδ_(k,j)=√{square root over (ν_(k)P_(j)/P)}1_(j sent) for j in J_(k).Finally,

$\sigma_{k}^{2} = {\frac{{\hat{s}}_{k - 1}}{{\hat{s}}_{k}}{accept}_{k - 1}P}$where accept_(k)=Σ_(jεdec) _(k) π_(j) is the size of the decoded set onstep k.

The demonstration of this lemma is found in the appendix section 14.1.It follows the same pattern as the demonstration of Lemma 1 with someadditional ingredients.

4.3 The Nearby Distribution:

Two joint probability measures

and

are now specified for all the Z_(k,j), jεJ and the ∥G_(k)∥ for k=1, . .. m. For

, it is to have the conditionals

specified above.

The

is the approximating distribution. Choose

to make all the Z_(k,j), for jεJ, for k=1, 2, . . . , m, be independentstandard normal, and like

, choose

to make the χ_(n−k+1) ²=∥G_(k)∥²/σ_(k) ² be independentChi-square(n−k+1) random variables.

Fill out of specification of the distribution assigned by

, via a sequence of conditionals

for Z_(k,J)=(Z_(k,j): jεJ), which is for all j in J, not just for j inJ_(k). Here

_(k) ^(full)=(∥G_(k′)∥, Z_(k′,J): k′=1, 2, . . . , k). For the variablesZ_(k,J) _(k) that actually used, the conditional distribution is that of

as specified in the above Lemma. Whereas for the Z_(k,j) with j in thealready decoded set J−J_(k)=dec_(1,k−1), given

_(k−1), it is convenient to arrange them to have the same independentstandard normal as is used by

. This completes the definition of the Z_(k,j) for all j, and with itone likewise extends the definition of

_(k,j) as a function of Z_(k,j) and ∥G_(k)∥ and completes the definitionof the events

_(k,j) for all j, used in the analysis.

This choice of independent standard normal for the distribution ofZ_(k,j) given

_(k−1) for j in dec_(1,k−1), is contrary to what would have arisen inthe proof of 2 from the inner product of U_(k,j) with G_(k)/∥G_(k)∥ ifthere one were to have looked there at such j with

=1 for earlier k′<k. Nevertheless, as said, there is freedom of choiceof the distribution of these variables not used by the decoder. Thepresent choice is a simpler extension providing a conditionaldistribution of (Z_(k,j): jεJ) that shares the same marginalization tothe true distribution of (Z_(k,j): jεJ_(k)) given

_(k−1).

An event A is said to be determined by

_(k) if its indicator is a function of

_(k). As

_(k)=(χ_(n−k′+1), Z_(k′+1), Z_(k′,J) _(k′) : k′≦k), with a random indexset J_(k) given as a function of preceding

_(k−1), it might be regarded as a tricky matter. Alternatively a randomvariable may be said to be determined by

_(k) if it is measurable with respect to the collection of randomvariables (∥G_(k′∥, Z) _(k′,j)1_({jεdec) _(1,k′−1) _(c) _(}), jεJ,1≦k′≦k). The multiplication by the indicator removes the effect on stepk′ of any Z_(k′,j) decoded on earlier steps, that is, any j outsideJ_(k′). Operationally, no advanced measure-theoretic notions arerequired, as the sequences of conditional densities being worked haveexplicit Gaussian form.

In the following lemma appeal to a sense of closeness of thedistribution

to

, such that events exponentially unlikely under

remain exponentially unlikely under the governing measure

.

Lemma 3.

For any event A determined by

_(k),

[A]≦

[A]e^(kc) ⁰ ,where c₀=(½)log(1+P.σ²). The analogous statement holds more generallyfor the expectation of any non-negative function of

_(k).

See the appendix, subsection 14.2, for the proof. The fact that c₀matches the capacity

might be interesting, but it is not consequential to the argument. Whatmatters for us is simply that if

[A] is exponentially small in L or n, then so is

[A].

4.4 Logic in Bounding Detections and False Alarms:

Simple logic concerning unions plays an important simplifying role inthe analysis here to lower bound detection rates and to upper boundfalse alarms. The idea is to avoid the distributional complication ofstuns restricted to terms not previously above threshold.

Here assume that dec_(k)=thresh_(k) each step. Section 7.2 discusses analternative approach where dec_(k) is taken to be a particular subset ofthresh_(k), to demonstrate slightly better reliability bounds for givenrates below capacity.

Recall that with {circumflex over (q)}_(k)=Σ_(j sent∩J) _(k) π_(j)

as the increment of weighted fraction of correct detections, the totalweighted fraction of correct detections {circumflex over (q)}_(k)^(tot)={circumflex over (q)}₁+ . . . +{circumflex over (q)}_(k) up tostep k is the same as the weighted fraction of the union Σ_(j sent)π_(j)

Accordingly, it has the lower bound

${\hat{q}}_{k}^{tot} \geq {\sum\limits_{j\mspace{14mu}{sent}}^{\;}{\pi_{j}1_{\mathcal{H}_{k,j}}}}$based solely on the step k half-spaces, where the sum on the right isover all j in sent, not just those in sent∩J_(k). That this simpler formwill be an effective lower bound on {circumflex over (q)}_(k) ^(tot)will arise from the fact that the statistic tested in

_(k,j) is approximately a normal with a larger mean at step k than atsteps k′<k, producing for all j in sent greater likelihood of occurrenceof

_(k,j) than earlier

_(k′,j).

Concerning this lower bound Σ_(j sent)π_(j)

, in what follows it is convenient to set {circumflex over (q)}_(1,k) tobe the corresponding sum Σ_(j sent)π_(j)1_(H) _(k,j) using a simplerpurified form H_(k,j) in place of

_(k,j). Outside of an exception event studied herein, this H_(k,j) is asmaller set that

_(k,j) and so then {circumflex over (q)}_(k) ^(tot) is at least{circumflex over (q)}_(1,k).

Meanwhile, with {circumflex over (f)}_(k)=Σ_(jεother∩J) _(k) π_(j)

as the increment of weighted count of false alarms, as seen, the totalweighted count of false alarms cot {circumflex over (f)}_(k)^(tot)={circumflex over (f)}₁+ . . . +{circumflex over (f)}_(k) is thesame as Σ_(j other)π_(j)

. It has the upper bound

${\hat{f}}_{k}^{tot} \leq {{\sum\limits_{j\mspace{14mu}{other}}^{\;}{\pi_{j}1_{\mathcal{H}_{1,j}}}} + \ldots + {\sum\limits_{j\mspace{14mu}{other}}^{\;}{\pi_{j}{1_{\mathcal{H}_{k,j}}.}}}}$Denote the right side of this bound {circumflex over (f)}_(1,k).

These simple inequalities permit establishment of likely levels ofcorrect detections and false alarm bounds to be accomplished byanalyzing the simpler forms Σ_(j sent)π_(j)

and Σ_(j other)π_(j)

without the restriction to the random set J_(k), which would complicatethe analysis.

Refinement Using Wedges:

Rather than using the last half-space

_(k,j) alone, one may obtain a lower bound on the indicator of the union

_(1,j)∪ . . . ∪

_(k,j) by noting that it contains

_(k−1,j)∪H

_(k,j) expressed as the disjoint union of the events

_(k,j) and

_(k−1,j)∩

_(k,j). The latter event may be interpreted as a wedge (an intersectionof two half-spaces) in terms of the pair of random variables

_(k−1,j) ^(comb) and

_(k,j). Accordingly, there is the refined lower bound on {circumflexover (q)}_(k) ^(tot)=Σ_(j sent)π_(j)

, given by

${\hat{q}}_{k}^{tot} \geq {{\sum\limits_{j\mspace{14mu}{sent}}^{\;}{\pi_{j}1_{\mathcal{H}_{k,j}}}} + {\sum\limits_{j\mspace{14mu}{sent}}^{\;}{\pi_{j}{1_{\mathcal{H}_{{k - 1},j}\bigcap\mathcal{H}_{k,j}^{c}}.}}}}$With this refinement a slightly improved bound on the likely fraction ofcorrect detections can be computed from determination of lower bounds onthe wedge probabilities. One could introduce additional terms fromintersection of three or more half-spaces, but it is believed that thesewill have negligible effect.

Likewise, for the false alarms, the union

_(1,j)∪ . . . ∪

_(k,j) expressed as the disjoint union of

_(k,j),

_(k−1,j)∩

_(k,j) ^(c), . . . ,

_(1,j)∩

_(2,j) ^(c)∩ . . . ∩

_(k,j) ^(c), has the improved upper-bound for its indicator given by thesum

1_(ℋ_(k, j)) + 1_(ℋ_(k − 1, j)⋂ℋ_(k, j)^(c)) + … + 1_(ℋ_(1, j)⋂ℋ_(2, j)^(c))given by just one half-space indicator and k−1 wedge indicators.Accordingly, the weighted total fraction of false alarms {circumflexover (f)}_(k) ^(tot) is upper-bounded by the π weighted sum of theseindicators for j in other. This leads to improved bounds on the likelyfraction of false alarms from determination of upper bounds on wedgeprobabilities.

Accounting with the Optional Analysis Window:

In the optional restriction to terms in the set pot_(k)=potential_(k)for each step, the {circumflex over (q)}_(k) take the same form but withJ_(k)=pot_(k)∩dec_(1,k−1) ^(c) in place of J_(k)=J∩dec_(1,k−1) ^(c).Accordingly the total weighted count of correct detections {circumflexover (q)}_(k) ^(tot)={circumflex over (q)}₁+ . . . +{circumflex over(q)}_(k) takes the form

${{\hat{q}}_{k}^{tot} \geq {\sum\limits_{j\mspace{14mu}{sent}}^{\;}{\pi_{j}1_{\{\bigcup_{k^{\prime},\mathcal{H}_{k^{\prime},j}}\}}}}},$where the union for term j is taken for steps in the set {k′≦k:jεpot_(k′)}. These unions are non-empty for the terms j inpot_(1,k)=pot₁∪ . . . ∪pot_(k). For terms in sent it will be arrangedthat for each j there is, as k′ increases, an increasing probability (ofpurified approximations) of the set

_(k′,j). Accordingly, for a lower bound on the indicator of the unionusing a single set use

where max_(k,j) is the largest of {k′>k: jεpot_(k′)}. Thus in place ofΣ_(j sent)π_(j)

, for the lower bound on the total weighted fraction of correctdetections this leads to

${\hat{q}}_{k}^{tot} \geq {\sum\limits_{j \in {{sent}\bigcap{pot}_{1,k}}}^{\;}{\pi_{j}{1_{\mathcal{H}_{\max_{k,j}{,j}}}.}}}$Likewise an upper bound on the total weighted fraction of false alarmsis

${\hat{f}}_{k}^{tot} \leq {{\sum\limits_{j \in {{other}\bigcap{pot}_{1}}}^{\;}{\pi_{j}1_{\mathcal{H}_{1,j}}}} + \ldots + {\sum\limits_{j \in {{other}\bigcap{pot}_{k}}}^{\;}{\pi_{j}{1_{\mathcal{H}_{k,j}}.}}}}$Again the idea is to have these simpler forms with single half-spaceevents, but now with each sum taken over a more targeted deterministicset, permitting a smaller total false alarm bound.

This document does not quantify specifics of the benefits of the wedgesand of the narrowed analysis window (or a combination of both). This isa matter of avoiding complication. But the matter can be revisited toproduce improved quantification of mistake bounds.

4.5 Adjusted Sums Replace Sums of Adjustments:

The manner in which the quantities {circumflex over (q)}₁, . . . ,{circumflex over (q)}_(k) and {circumflex over (f)}₁, . . . {circumflexover (f)}_(k) arise in the distributional analysis of Lemma 2 is throughthe sum{circumflex over (q)} _(k) ^(adj,tot) ={circumflex over (q)} ₁ ^(adj)+ .. . +{circumflex over (q)}_(k) ^(adj)of the adjusted values {circumflex over (q)}_(k) ^(adj)={circumflex over(q)}_(k)/(1+{circumflex over (f)}_(k)/{circumflex over (q)}_(k)).Conveniently, by Lemma 4 below, {circumflex over (q)}_(k)^(adj,tot)≧{circumflex over (q)}_(k) ^(tot,adj). That is, the total ofadjusted increments is at least the adjusted total given by

${\hat{q}}_{k}^{{tot},{adj}} = \frac{{\hat{q}}_{k}^{tot}}{1 + {{\hat{f}}_{k}^{tot}/{\hat{q}}_{k}^{tot}}}$which may also be written

${\hat{q}}_{k}^{tot} - {\hat{f}}_{k}^{tot} + {\frac{\left( {\hat{f}}_{k}^{tot} \right)^{2}}{{\hat{q}}_{k}^{tot} + {\hat{f}}_{k}^{tot}}.}$In terms of the total weighted count of tests above threshold accept_(k)^(tot)={circumflex over (q)}_(k) ^(tot)+{circumflex over (f)}_(k) ^(tot)it is

${\hat{q}}_{k}^{{tot},{adj}} = {{accept}_{k}^{tot} - {2{\hat{f}}_{k}^{tot}} + {\frac{\left( {\hat{f}}_{k}^{tot} \right)^{2}}{{accept}_{k}^{tot}}.}}$

Lemma 4.

Let f₁, . . . , f_(k) and g₁, . . . , g_(k) be non-negative numbers.Then

${\frac{g_{1}}{1 + {f_{1}/g_{1}}} + \ldots + \frac{g_{k}}{1 + {f_{k}/g_{k}}}} \geq {\frac{g_{1} + \ldots + g_{k}}{1 + {\left( {f_{1} + \ldots + f_{k}} \right)/\left( {g_{1} + \ldots + g_{k}} \right)}}.}$Moreover, both of these quantities exceed(g ₁ + . . . +g _(k))−(f ₁ + . . . +f _(k)).

Demonstration of Lemma 4:

Form p_(k′)=f_(k′)/[f₁+ . . . +f_(k)] and interpret as probabilities fora random variable K taking values k′ from 1 to k. Consider the convexfunction defined by ψ(x)=x/(1+1/x). After accounting for thenormalization, the left side is

[ψ(g_(K)/f_(K))] and the right side is ψ[

(g_(K)/f_(K))]. So the first claim holds by Jensen's inequality. Thesecond claim is because g/(1+f/g) equals g−f/(1+f/g) or equivalentlyg−f+f²/(g+f), which is at least g−f. This completes the demonstration ofLemma 4.

This lemma is used to assert that ŝ_(k)=1/(1−{circumflex over (q)}_(k−1)^(adj,tot)ν) is at least 1/(1−{circumflex over (q)}_(k−1) ^(tot/adj)ν).For suitable weights of combination this ŝ_(k) corresponds to a totalshift factor, as developed in the next section.

5 Separation Analysis

In this section the extent of separation is explored between thedistributions of the test statistics

_(k,j) ^(comb) for j sent versus other j. In essence, for j sent, thedistribution is a shifted normal. The assignment of the weights λ usedin the definition of

_(k,j) ^(comb) is arranged so as to approximately maximize this shift.

5.1 The Shift of the Combined Statistic:

Concerning the weights λ_(1,k), λ_(2,k), . . . , λ_(k,k), for notationalsimplicity hide the dependence on k and denote them simply by λ₁, . . ., λ_(k), as elements of a vector λ. This λ is to be a member of thesimplex S_(k)={λ:λ_(k′)≧0,Σ_(k′=1) ^(k)λ_(k′)=1} in which thecoordinates are non-negative and stun to 1.

With weight vector λ the combined test statistic

λ_(k,j) ^(comb) takes the formshift_(λ,k,j)1_({j sent}) +Z _(λ,k,j) ^(comb)whereZ _(λ,k,j) ^(comb)=√{square root over (λ₁)}Z_(1,j)−√{square root over(λ₂)}Z_(2,j)− . . . −√{square root over (λ_(k))}Z_(k,j).For convenience of analysis, it is defined not just for jεJ_(k), butindeed for all jεJ, using the normal distribution for the Z_(k′,j)discussed above. Hereshift_(λ,k,j)=shift_(λ,k)√{square root over (C _(j,R,B))}where shift_(λ,k) is√{square root over (λ₁χ_(n) ² /n)}+√{square root over (λ_(k) ŵ ₂χ_(n−1)² /n)}+ . . . +√{square root over (λ_(k) ŵ _(k)χ_(n−k+1) ² /n)},where χ_(n−k+1) ²=∥G_(k)∥²/σ_(k) ². This shift_(λ,k) would be largestwith λ_(k′) proportional to ŵ_(k′)χ_(n−k′+1) ².

Outside of an exception set A_(h) developed further below, theseχ_(n−k′+1) ²/n are at least 1−h, with small positive h. Then shift_(λ,k)is at least √{square root over (1−h)} times√{square root over (λ₁ ŵ ₁)}+√{square root over (λ_(k) ŵ ₂)}+ . . .+√{square root over (λ_(k) ŵ _(k))}.

The test statistic

_(k,j) ^(comb) along with its constituent Z.k,j^(comb) arises byplugging in particular choices of {circumflex over (λ)}. Most choices ofthese weights that arise in our development will depend on the data andthose exact normality of Z_(k,j) ^(comb) does not hold. This matter isaddressed using tools of empirical processes, to show uniformity ofcloseness of relative frequencies based on Z_(λ,k,j) ^(comb) to theexpectations based on the normal distribution. This uniformity can beexhibited over all λ in the simplex S_(k). For simplicity it isexhibited over a suitable subset of it.

5.2 Maximizing Separation:

Setting λ_(k′) equal to ŵ_(k′)/(1+ŵ₂+ . . . +ŵ_(k)) for k′≦k would beideal, as it would maximize the resulting shift factor √{square rootover (λ₁)}+√{square root over (ŵ₂)}√{square root over (λ₂)}+ . . .+√{square root over (ŵ_(k))}√{square root over (λ_(k))}, for λεS_(k),making it equal √{square root over (1+ŵ₂+ . . . +ŵ_(k))}=√{square rootover (ŝ_(k))}, where ŝ_(k)=1/(1−q_(k) ^(adj,tot)ν) andŵ_(k′)=ŝ_(k′)−ŝ_(k′−1).

Setting λ_(k′) proportional to ŵ_(k′) may be ideal, but it suffers fromthe fact that without advance knowledge of sent and other, the decoderdoes not have access to the separate values of {circumflex over(q)}_(k)=Σ_(jεsent∩J) _(k) π_(j)

and {circumflex over (f)}_(k)=Σ_(jεother∩J) _(k) π_(j)

needed for precise evaluation of ŵ_(k). A couple of means are devised toovercome this difficulty. The first is to take advantage of the factthat the decoder does have accept_(k)=={circumflex over(q)}_(l)+{circumflex over (f)}_(k)=Σ_(jεJ) _(k) π_(j)

, which is the weighted count of terms above threshold on step k. Thesecond is to use computation of ∥G_(k)∥²/n which is σ_(k) ²χ_(n−k+1) ²/nas an estimate of σ_(k) ² with which a reasonable estimate of ŵ_(k) isobtained. A third method is to use residuals as discussed in theappendix, though its analysis is more involved.

5.3 Setting Weights {circumflex over (λ)} Based on Accept_(k):

The first method uses accept_(k), in place of {circumflex over (q)}_(k′)^(adj) where it arises in the definition of ŵ_(k′) to produce a suitablechoice of λ_(k′). Abbreviate accept_(k) as acc_(k), when needed to allowcertain expressions to be suitably displayed. This accept_(k)upperbounds {circumflex over (q)}_(k) and is not much greater that{circumflex over (q)}_(k) when suitable control of the false alarms isachieved.

Recall ŵ_(k)=ŝ_(k)−ŝ_(k−1) for k>1 so finding the common denominator ittakes the form

${{\hat{w}}_{k} = \frac{{\hat{q}}_{k - 1}^{adj}v}{\left( {1 - {{\hat{q}}_{k - 1}^{{adj},{tot}}v}} \right)\left( {1 - {{\hat{q}}_{k - 2}^{{adj},{tot}}v}} \right)}},$with the convention that {circumflex over (q)}₀ ^(adj)=0. Let ŵ_(k)^(acc) be obtained by replacing {circumflex over (q)}_(k−1) ^(adj) withits upper bound of acc_(k−1)=accept_(k−1) and likewise replacing{circumflex over (q)}_(k−2) ^(adj,tot) and {circumflex over (q)}_(k−1)^(adj,tot) with their upper bounds acc_(k−2) ^(tot) and acc_(k−1)^(tot), respectively, with acc₀ ^(tot)=0. Thus as an upper bound onŵ_(k) set

${{\hat{w}}_{k}^{acc} = \frac{{acc}_{k - 1}v}{\left( {1 - {{acc}_{k - 2}^{tot}v}} \right)\left( {1 - {{acc}_{k - 1}^{tot}v}} \right)}},$where for k=1 set ŵ_(k) ^(acc)=ŵ_(k)=1. For k>1 this ŵ_(k) ^(acc) isalso

$\frac{1}{1 - {{acc}_{k - 1}^{tot}v}} - {\frac{1}{1 - {{acc}_{k - 2}^{tot}v}}.}$Now each accept_(k′) exceeds {circumflex over (q)}_(k′) ^(adj) and isless than {circumflex over (q)}_(k′) ^(adj)+2{circumflex over (f)}_(k′).

Then set proportional to {circumflex over (λ)}_(k′) proportional toŵ_(k) ^(acc). Thus

${\hat{\lambda}}_{1} = \frac{1}{1 + {\hat{w}}_{2}^{acc} + \ldots + {\hat{w}}_{k}^{acc}}$and for k′ from 2 to k one has

${\hat{\lambda}}_{k^{\prime}} = {\frac{{\hat{w}}_{k^{\prime}}^{acc}}{1 + {\hat{w}}_{2}^{acc} + \ldots + {\hat{w}}_{k}^{acc}}.}$The shift factor√{square root over ({circumflex over (λ)}₁)}+√{square root over({circumflex over (λ)}₂ ŵ ²)}+ . . . +√{square root over ({circumflexover (λ)}_(k) ŵ _(k))}is then equal to the ratio

$\frac{1 + \sqrt{{\hat{w}}_{2}^{acc}{\hat{w}}_{2}} + \ldots + \sqrt{{\hat{w}}_{k}^{acc}{\hat{w}}_{k}}}{\sqrt{1 + {\hat{w}}_{2}^{acc} + \ldots + {\hat{w}}_{k}^{acc}}}.$From ŵ_(k′) ^(acc)≧ŵ_(k′) the numerator is at least 1+ŵ₂+ . . .+ŵ_(l)=ŝ_(k), equaling 1/(1−({circumflex over (q)}₁ ^(adj)+ . . .+{circumflex over (q)}_(k−1) ^(adj))ν), which per Lemma 4 is at least1/(1−{circumflex over (q)}_(k−1) ^(tot,adj)ν). As for the sum in thedenominator, it equals 1/(1−acc_(k−1) ^(tot)ν). Consequently, the aboveshift factor using {circumflex over (λ)} is at least

$\frac{\sqrt{1 - {{acc}_{k - 1}^{tot}v}}}{1 - {{\hat{q}}_{k - 1}^{{tot},{adj}}v}}.$Recognizing that acc_(k−1) ^(tot) and {circumflex over (q)}_(k−1) ^(tot)are similar when the false alarm effects are small, it is desirable toexpress this shift factor in the form

$\sqrt{\frac{1 - {\hat{h}}_{f,{k - 1}}}{1 - {{\hat{q}}_{k - 1}^{{tot},{adj}}v}}},$where ĥ_(f,k) for each k is a small term depending on false alarms.

Some algebra confirms this is so with

${\hat{h}}_{f,k} = {{\hat{f}}_{k}^{tot}\frac{\left( {2 - {{\hat{f}}_{k}^{tot}/{acc}_{k}^{tot}}} \right)v}{1 - {{\hat{q}}_{k}^{{tot},{adj}}v}}}$which is less than the value 2{circumflex over (f)}_(k) ^(tot)ν/(1−ν)equal to 2{circumflex over (f)}_(k) ^(tot)snr. Except in cases of largesnr this approach is found to be quite suitable.

To facilitate a simple empirical process argument, replace each acc_(k)by its value ┌acc_(k){tilde over (L)}┐/{tilde over (L)} rounded up to arational of denominator {tilde over (L)} for some integer {tilde over(L)} large compared to k. This restricts the acc_(k) to a set of valuesof cardinality {tilde over (L)} and correspondingly the set of values ofacc₁, . . . , acc_(k−1) determining ŵ₂ ^(acc), . . . , ŵ_(k) ^(acc) andhence {circumflex over (λ)}₁, . . . , {circumflex over (λ)}_(k) isrestricted to a set of cardinality ({tilde over (L)})^(k−1). Theresulting acc_(k) ^(tot) is then increased by at most k/{tilde over (L)}compared to the original value. With this rounding, one can deduce thatĥ _(f,h)≦2{circumflex over (f)} _(k) ^(tot) snr+k/{tilde over (L)}.

Next proceed with defining natural exception sets outside of which{circumflex over (q)}_(k′) ^(tot) is at least a deterministic valueq_(1,k′) and {circumflex over (f)}_(k′) ^(tot) is not more than adeterministic value f_(1,k′) for each k′ from 1 to k. This leads to{circumflex over (q)}_(k) ^(tot,adj) being at least q_(1,k) ^(adj),whereq _(1,k) ^(adj) =q _(1,k)/(1+f _(1,k) /q _(1,k))and ĥ_(f,k) is at most h_(f,k)=2f_(1,k)snr, and likewise for each k′≦k.This q_(1,k) ^(adj) is regarded as an adjustment to q_(1,k) due to falsealarms.

When rounding the acc_(k) to be rational of denominator {tilde over(L)}, it is accounted for by settingh _(k,k)=2f _(1,k) snr+k/{tilde over (L)}.The result is that the shift factor given above is at least thedeterministic value given by √{square root over (1−h_(f,k−1))}/√{squareroot over (1−q_(1,k−1) ^(adj)ν)} near 1/√{square root over (1−q_(1,k−1)^(adj)ν)}. Accordingly shift_({circumflex over (λ)},k,j) exceeds thepurified value

${\sqrt{\frac{1 - h^{\prime}}{1 - {q_{1,{k - 1}}^{adj}v}}}\sqrt{C_{j,R,B}}},$where 1−h′=(1−h_(f))(1−h), with h′=h+h_(f)−hh_(f), where h_(f)=h_(f,m−1)serves as an upper bound to the h_(f,k−1) for all steps k≦m.5.4 Setting Weights {circumflex over (λ)} Based on Estimation of σ_(k)²:

The second method entails estimation of ŵ_(k) using an estimate of σ_(k)². For its development make use of the multiplicative relationship fromLemma 2,

${{\hat{s}}_{k} = {{\hat{s}}_{k - 1}\frac{{ACC}_{k - 1}}{\sigma_{k}^{2}}}},$where ACC_(k−1)=acc_(k−1)P=Σ_(jεdec) _(k−1)P_(j) is the un-normalizedweight of terms above threshold on step k−1. Accordingly, fromŵ_(k)=ŝ_(k)−ŝ_(k+1) it follows that

${{\hat{w}}_{k} = {{\hat{s}}_{k - 1}\left( {\frac{{ACC}_{k - 1}}{\sigma_{k}^{2}} - 1} \right)}},$where the positivity of ŵ_(k) corresponds to ACC_(k−1)≧σ_(k) ². Also

${\hat{s}}_{k} = {\prod\limits_{k^{\prime} = 1}^{k - 1}\;{\frac{{ACC}_{k^{\prime} - 1}}{\sigma_{k^{\prime}}^{2}}.}}$Recognize that each 1/σ_(k′) ²=χ_(n−k′+1) ²/∥G_(k′∥) ². Again, outsidean exception set, replace each χ_(n−k′+1) ² by its lower bound n(1−h),obtaining the lower bounding estimates

${{\hat{w}}_{k}^{low} = {{\hat{s}}_{k - 1}^{low}\left( {\frac{{ACC}_{k - 1}}{{\hat{\sigma}}_{k}^{2}} - 1} \right)}},{where}$${\hat{s}}_{k}^{low} = {\prod\limits_{k^{\prime} = 1}^{k - 1}\;\frac{{ACC}_{k^{\prime} - 1}}{{\hat{\sigma}}_{k^{\prime}}^{2}}}$with${\hat{\sigma}}_{k}^{2} = {\max{\left\{ {\frac{{G_{k}}^{2}}{n\left( {1 - h} \right)},{ACC}_{k - 1}} \right\}.}}$Initializing with ŝ₁ ^(low)=ŵ₁ ^(low)=1 again have ŵ_(k) ^(low)=ŝ_(k)^(low)−ŝ_(k−1) ^(low) and hences _(k) ^(low)=1+ŵ ₂ ^(low)+ . . . +ŵ_(k) ^(low).Set the weights of combination to be {circumflex over (λ)}_(k′)=ŵ_(k′)^(low)/ŝ_(k) ^(low) with which the shift factor is

$\frac{1 + \sqrt{{\hat{w}}_{2}^{low}{\hat{w}}_{2}} + \ldots + \sqrt{{\hat{w}}_{k}^{low}{\hat{w}}_{k}}}{\sqrt{{\hat{s}}_{k}^{low}}}.$Using ŵ_(k′)≧ŵ_(k′) ^(low) this is at least

${\frac{1 + {\hat{w}}_{2}^{low} + \ldots + {\hat{w}}_{k}^{low}}{\sqrt{{\hat{s}}_{k}^{low}}} = \sqrt{{\hat{s}}_{k}^{low}}},$which is √{square root over (ŝ_(k))} times the square root of

$\prod\limits_{k^{\prime} = 1}^{k - 1}{\left( \frac{\left( {1 - h} \right)n}{\chi_{n - k^{\prime} + 1}^{2}} \right).}$When using this method of estimating ŵ_(k) augment the exception set sothat outside it one has χ_(n−k′+1) ²/n≦(1+h). Then the above product isat least [(1−h)/(1+h)]^(k−1) and the shift factorshift_({circumflex over (λ)},k) is at least

${\sqrt{{\hat{s}}_{k}\left( {1 - h^{\prime}} \right)} \geq \sqrt{\frac{1 - h^{\prime}}{1 - {q_{1,{k - 1}}^{adj}v}}}},$where now 1−h′=(1−h)/(1+h)^(k−1). Here the additional (1−h) factor, asbefore, is to account in the definition ofshift_({circumflex over (λ)},k) for lower bounding the χ_(n−k′+1) ²/n by(1−h).

Whether now the [(1−h)/(1−h)]^(k−1) is less of a drop than the(1−h_(f))=(1−2f_(k−1)snr) from before depends on the choice of h, thebound on the false alarms, the number of steps k and the signal to noiseratio snr.

Additional motivation for this choice of {circumflex over (λ)}_(k) comesfrom consideration of the tests statistics Z_(k,j) ^(res)=X_(j)^(T)res_(k)/∥res_(k)∥ formed by taking the inner products of X_(j) withthe standardized residuals, where res_(k) denotes the difference betweenY and its projection onto the span of F₁, F₂, . . . , F_(k−1). It isshown in the appendix that these statistics have the same representationbut with λ_(k′)=w_(k′)/s_(k), for k′≦k, where s_(k)=∥Y∥²/∥res_(k)∥² andw_(k)=s_(k)−s_(k−1), again initialized with s₁=w₁=1. In place of theiterative rule developed above

${{\hat{s}}_{k} = {{{\hat{s}}_{k - 1}\frac{{ACC}_{k - 1}}{\sigma_{k}^{2}}} = {{\hat{s}}_{k - 1}\frac{{ACC}_{k - 1}\chi_{n - k + 1}^{2}}{{G_{k}}^{2}}}}},$these residual-based s_(k) are shown there to satisfy

$s_{k} = {s_{k - 1}\frac{{{\overset{\sim}{F}}_{k - 1}}^{2}}{{G_{k}}^{2}}}$where {tilde over (F)}_(k−1) is the part of F_(k−1) orthogonal to theprevious F_(k′) for k′=1, . . . , k−2.

Intuitively, given that the coordinates of X_(j) are i.i.d. with mean 0and variance 1, this ∥{tilde over (F)}_(k−1)∥² should not be toodifferent from ∥F_(k−1)∥² which should not be too different from nAXX_(k−1). So these properties give additional motivation for thischoice. It is also tempting to try to see whether this λ based on theresiduals could be amenable to the method of analysis, herein. It wouldseem that one would need additional properties of the design matrix X,such as uniform isometry properties of subsets of certain sizes.However, it is presently unclear whether such properties could beassured without harming the freedom to have rate up to capacity. For nowstick to the simpler analysis based on the estimates here of the ŵ_(k)that maximizes separation.

5.5 Exception Events and Purified Statistics:

Consider more explicitly the exception eventsA _(q)=∪_(k′=1) ^(k−1) {{circumflex over (q)} _(k′) ^(tot) <q _(1,k′)}andA _(f)=∪_(k′=1) ^(k−1) {{circumflex over (f)} _(k′) ^(tot) >f _(1,k′)}.As said, one may also work with the related events ∪_(k′=1)^(k−1){{circumflex over (q)}_(1,k′)<q_(1,k′)} and ∪_(k′=1)^(k−1){{circumflex over (f)}_(1,k′)>f_(1,k′)}.

Define the Chi-square exception event A_(h) to include∪_(k′=1) ^(k){χ_(n−k′+1) ² /n≦1−h}or equivalently ∪_(k′=1) ^(k){χ_(n−k′+1) ²/(n−k′+1)≦(1−h_(k′))} whereh_(k′) is related to h by the equation (n−k′+1)(1−h_(k′))=n(1−h). Forthe second method it is augmented by including also∪_(k′=1) ^(k){χ_(n−k′+1) ² /n≧1+h}.

The overall exception event is A=A_(q)∪A_(f)∪A_(h). When outside thisexception set, the shift_({circumflex over (λ)},k,j) exceeds thepurified value given by

${{shif}t}_{k,j} = {\sqrt{\frac{1 - h^{\prime}}{1 - {q_{1,{k - 1}}^{adj}v}}}{\sqrt{C_{j,R,B}}.}}$Recalling that C_(j,R,B)=π_(j)νL(log B)/R the factor 1−h′ may beabsorbed into the expression by lettingC _(j,R,B,h) =C _(j,R,B)(1−h′).Or in terms of the section index l write C_(l,R,B,h)=C_(l,R,B)(1−h′).Then the above lower bound on the shift may be expressed as

$\sqrt{\frac{C_{j,R,B,h}}{1 - {x\; v}}}$evaluated at x=q_(1,k−1) ^(adj), also denoted as

${shift}_{\ell\;,x} = \sqrt{\frac{C_{\ell,R,B,h}}{1 - {x\; v}}}$

For λ in S_(k), set H_(λ,k,j) to be the purified event that theapproximate combined statistic shift_(k,j)1_(j sent)+Z_(λ,k,j) ^(comb)is at least the threshold τ. That is,H _(λ,k,j)={shift_(k,j)1_(j sent) +Z _(λ,k,j) ^(comb)≧τ},where in contrast to

_(k,j)={

k,j^(comb)≧τ} a standard rather than a calligraphic font is used forthis event H_(λ,k,j) based on the normal Z_(λ,k,j) ^(comb) with thepurified shift.

Recall that the coordinates of λ, denoted λ_(k′,k) for k′=1, 2, . . . k,have dependence on k. For each k, the λ_(k′,k) can be determined fromnormalization of segments of the first k in sequences w₁, w₂, . . . ,w_(m) of positive values. With an abuse of notation, also denote thesequence for k=1, 2, . . . , m of such standardized combinationsZ_(λ,k,j) ^(comb) as

$Z_{w,k,j}^{comb} = {\frac{{\sqrt{w_{1}}Z_{1,j}} - {\sqrt{w_{2}}Z_{2,j}} - \ldots - {\sqrt{w_{k}}Z_{k,j}}}{\sqrt{w_{1} + w_{2} + \ldots + w_{k}}}.}$In this case the corresponding event H_(λ,k,j) is also denotedH_(w,k,j).

Exept in A_(q)∪A_(f)∪A_(h), the event

_(k,j) contains H_({circumflex over (λ)},k,j) also denote as H_(ŵ)_(acc) _(,k,j) or H_(ŵ) _(low) _(,k,j), respectively, for the twomethods of estimating ŵ.

Also, as for the actual test statistics, the purified forms satisfy theupdatesZ _(λ,k,j) ^(comb)=√{square root over (1−λ_(k))}Z_(λ,k−1)^(comb)−√{square root over (λ_(k))}Z_(k,j)where λ_(k)=λ_(k,k).5.6 Definition of the Update Function:

Via C_(j,R,B) the expression for the shift is decreasing in R. Smaller Rproduce a bigger shift and greater statistical distinguishabilitybetween the terms sent and those not sent. This is a propertycommensurate with the communication interest in the largest R for whichafter a suitable number of steps one can reliable distinguish most ofthe terms.

Take note for j sent that shift_(k,j) is equal to

${\mu_{j}(x)} = \sqrt{\frac{{??}_{j,R,B,h}}{1 - {xv}}}$evaluated at x=q_(l,k−1) ^(adj). To bound the probability with which aterm sent is successfully detected by step k, examine the behavior ofΦ(μ_(j)(x)−τ)which, at that x, is the

probability of the purified event H_(λ,k,j) for in sent, based on thestandard normal cumulative distribution of Z_(λ,k,j) ^(comb). ThisΦ(μ_(j)(x)−τ) is increasing in x.

For constant power allocation the contributions Φ(μ_(j)(x)−τ) are thesame for all j in sent, whereas, for decreasing power assignments, onehas a variable detection probability. Note that it is greater than ½ forthose j for which μ_(j)(x) exceeds τ. As x increases, there is a growingset of sections for which μ_(j)(x) sufficiently exceeds τ, such thatthese sections have high

probability of detection.

The update function g_(L)(x) is defined as the π weighted average ofthese Φ(μ_(j)(x)−τ) for j sent, namely,

${g_{L}(x)} = {\sum\limits_{j\mspace{14mu}{sent}}\;{\pi_{j}{\Phi\left( {{\mu_{j}(x)} - \tau} \right)}}}$or, equivalently,

${{g_{L}(x)} = {\sum\limits_{\ell = 1}^{L}\;{\pi_{(\ell)}{\Phi\left( {{shift}_{\ell,x} - \tau} \right)}}}},$an L term sum. That is, g_(L)(x) is the

expectation of the sample weighted fraction Σ_(j sent)π_(j)1_(H)_(λ,k,j) for any A in S_(k). The idea is that for any given x thissample weighted fraction will be near g_(L)(x), except in an event ofexponentially small probability.

This update function g _(L) on g_(L) on [0,1] indeed depends on thepower allocation π as well as the design parameters L, B, R, and thevalue a determining τ=√{square root over (2 log B)}+a. Plus it dependson the signal to noise ratio via ν=snr/(1+snr). The explicit use of thesubscript L is to distinguish the stun g_(L)(x) from an integralapproximation to it denoted g that will arise later below.

6 Detection Build-up with False Alarm Control

In this section, target false alarm rates are set and a framework isprovided for the demonstration of accumulation of correct detections ina moderate number of steps.

6.1 Target False Alarm Rates:

A target weighted false alarm rate for step k arises as a bound f* onthe expected value of Σ_(j other)π_(j)1_(H) _(q,j,k) . This expectedvalue is (B−1) Φ(τ), where Φ(τ) is the upper tail probability with whicha standard normal exceeds the threshold τ=√{square root over (2 logB)}+a. A tight bound is

$\frac{1}{\left( {\sqrt{2\mspace{11mu}\log\mspace{11mu} B} + a} \right)\sqrt{2\pi}}\exp{\left\{ {{{- a}\sqrt{2\mspace{11mu}\log\mspace{11mu} B}} - {\left( {1/2} \right)a^{2}}} \right\}.}$There is occasion to make use of the similar choice of f* equal to

$\frac{1}{\left( \sqrt{2\mspace{11mu}\log\mspace{11mu} B} \right)\sqrt{2\pi}}\exp{\left\{ {{- a}\sqrt{2\mspace{11mu}\log\mspace{11mu} B}} \right\}.}$The fact that these indeed upper bound B Φ(τ) follows from Φ(x)≦φ(x)/xfor positive x, with φ being the standard normal density. Likewise setf>f*. Express f=ρf* with ρ>1, w Across the steps k, the choice ofconstant a_(k)=a produces constant f_(k)*=f* with stun f_(1,k)* equal tokf*. Furthermore, set f_(1,k)>f_(1,k), which arises in upper boundingthe total false alarm rate. In particular, it is arranged for the ratiof_(1,k)/f_(1,k)* to be at least as large as a fixed ρ>1.

At the final step m, letf*=f _(1,m) *=mf*be the baseline total false alarm rate, and use f=f_(1,m), typicallyequal to ρ f*, to be a value which will be shown to likely upper boundΣ_(j other)π_(j)1_(∪) _(k=1) _(m) _(H) _(q,j,k) .

As will be explored soon, it is needed for f_(1,k) to stay less than atarget increase in the correct detection rate each step. As thisincrease will be a constant times 1/log B, for certain rates close tocapacity, this will then mean that f and hence f* need to be bounded bya multiple of 1/log B. Moreover, the number of steps m will be of orderlog B. So with f*=mf* this means f* is to be of order 1/(log B)². Fromthe above expression for f*, this will entail choosing a value of a near( 3/2)(log log B)/√{square root over (2 log B)}.6.2 Target Total Detection Rate:

A target total detection rate q_(1,k)* and the associated values q_(1,k)and q_(1,k) ^(adj) are recursively defined using the function g_(L)(x).

In particular, per the preceding section, let

$q_{1,k}^{*} = {\sum\limits_{j\mspace{14mu}{sent}}\;{\pi_{j}{\Phi\left( {{shift}_{k,j} - \tau} \right)}}}$which is seen to beq _(1,k) *=g _(L)(x)evaluated at x=q_(1,k−1) ^(adj). The convention is adopted at k=1 thatthe previous q_(k−1) and x=q_(1,k−1) ^(adj) are initialized at 0. Tocomplete the specification, a sequences of small positive η_(k) arechosen with which it is set that_(1,k) =q _(1,k)*−η_(k).For instance one may set η_(k)=η. The idea is that these η_(k) willcontrol the exponents of tail probabilities of the exception set outsideof which {circumflex over (q)}_(k) ^(tot) exceeds q_(1,k). With thischoice of q_(1,k) and f_(1,k) one has alsoq _(1,k) ^(adj) =q _(1,k)/(1+f _(1,k) /q _(1,k)).

Positivity of the gap g_(L)(x)−x provides that q_(1,k) is larger thanq_(1,k−1) ^(adj). As developed in the next subsection, the contributionsfrom η_(k) and f_(1,k) are arranged to be sufficiently small thatq_(1,k) ^(adj) and q_(1,k) are increasing with each such step. In thisway the analysis will quantify as x increases, the increasing proportionthat are likely to be above threshold.

6.3 Building Up the Total Detection Rate:

Let's give the framework here for how the likely total correct detectionrate q_(1,k) builds up to a value near 1, followed by the correspondingconclusion of reliability of the adaptive successive decoder. Here thenotion of correct detection being accumulative is defined. This notionholds for the power allocations studied herein.

Recall that with the function g_(L)(x) defined above, for each step, oneupdates the new q_(1,k) by choosing it to be slightly less thanq_(1,k)*=q_(L)(q_(1,k−1) ^(adj)). The choice of q_(1,k), is accomplishedby setting a small positive η_(k) for which q_(1,k)=q_(1,k)*−η_(k).These may be constant, that is η_(k)=η, across the steps k=1, 2, . . . ,m.

There are slightly better alternative choices for the η_(k) motivated bythe reliability bounds. One is to arrange for D(q_(1,k)∥q_(1,k)*) to beconstant where D is the relative entropy between Bernoulli randomvariables of the indicated success probabilities. Another is to arrangeη_(k) such that η_(k)/√{square root over (V_(k))} is constant, whereV_(k)=V(x) evaluated at x=q_(1,k−1) ^(adj), where

${{V(x)}/L} = {\sum\limits_{j\mspace{14mu}{sent}}\;{\pi_{j}{\Phi\left( {\mu_{j}(x)} \right)}{{\overset{\_}{\Phi}\left( {\mu_{j}(x)} \right)}.}}}$This V_(k)/L may be interpreted as a variance of {circumflex over(q)}_(1,k) as developed below. The associated standard deviation factor√{square root over (V(x))} is shown in the appendix to be proportionalto (1−xν).

With evaluation at x=q_(1,k−1) ^(adj), this gives rise to η_(k)=η(x)equal to (1−xν) times a small constant.

How large one can pick η_(k) will be dictated by the size of the gapg_(L)(x)−x at x=q_(1,k−1) ^(adj).

Let x* be any given value between 0 and 1, preferably not far from 1.

Definition:

A positive increasing function g(x) bounded by 1 is said to beaccumulative for 0≦x≦x* if there is a function gap(x)>0, withg(x)−x≧gap(x)for all 0≦x≦x*. An adaptive successive decoder with rate and powerallocation chosen so that the update function g_(L)(x) satisfies thisproperty is likewise said to be accumulative. The shortfall is definedby δ*=1−g_(L)(x*).

If the update function is accumulative and has a small shortfall, thenit is demonstrated, for a range of choices of η_(k)>0 andf_(1,k)>k_(1,k)*, that the target total detection rate q_(1,k) increasesto a value near 1 and that the weighted fraction mistakes is with highprobability less than δ_(k)=(1−q_(1,k))+f_(1,k). This mistake rate δ_(k)is less than 1−x* after a number of steps, and then with one more stepit is further reduced to a value not much more than δ*=1−g_(L)(x*), totake advantage of the amount by which g_(L)(x*) exceeds x*.

The tactic in providing good probability exponents will be todemonstrate, for the sparse superposition code, that there is anappropriate size gap. It will be quantified via bounds on the minimum ofthe gap or the minimum of the ratio gap(x)/(1−xν) that arises in astandardization of the gap, where the minimum is taken for 0≦x≦x*.

The following lemmas relate the sizes of η and f and the number of stepsm to the size of the gap.

Lemma 5.

Suppose the update function g_(L)(x) is accumulative on [0,xx*] withg_(L)(x)−x≧gap for a positive constant gap>0. Arrange positive constantsη and f and m*≧2, such thatη+ f+1/(m*−1)=gap.Suppose f_(1,k)≦ f as arises from f_(1,k)= f or from f_(1,k)=kf for eachk≦m* with f= f/m*. Set q_(1,k)=q_(1,k)*−η. Then q_(1,k) is increasing oneach step for which q_(1,k−1)−f_(1,k−1)≦x*, and, for such k theincrement q_(1,k)−q_(1,k−1) is at least 1/(m*−1). The number of stepsk=m−1 required such that q_(1,k)−f_(1,k) first exceeds x*, is bounded bym*−1. At the final step m≦m*, the weighted fraction of mistakes targetδ_(m)=(1−q_(1,m))+f_(1,m) satisfiesδ_(m) ≦δ*+η+ f.

The value δ_(m)=(1−q_(1,m))+f_(1,m) is used in controlling the sum ofweighted fractions of failed detections and of false alarms.

In the decomposition of the gap, think of η and f as providing portionsof the gap which contribute to the probability exponent and false alarmrate, respectively, whereas the remaining portion controls the number ofsteps.

The following is an analogous conclusion for the case of a variable sizegap bound. It allows for somewhat greater freedom in the choices of theparameters, with η_(k) and f_(1,k) determined by functions η(x) andf(x), respectively, evaluated at x=q_(1,k−1) ^(adj).

Lemma 6.

Suppose the update function is accumulative on [0,x*]. Choose positivefunctions η(x) and f(x) on [0,x*] with gap(x)−η(x)− f(x) not less than apositive value denoted gap′. Suppose q_(1,k)=q_(1,k)*−η_(k) whereη_(k)≦η(q_(1,k−1) ^(adj)) and f_(1,k)≦ f(q_(1,k−1) ^(adj)). Thenq_(1,k)−q_(1,k−1)>gap′ on each step for which q_(1,k−1) ^(adj)≦x* andthe number of steps k such that the q_(1,k) ^(adj) first exceeds x* isbounded by 1/gap′. With a number of steps m≦1+1/gap′, theδ_(m)=(1−q_(1,m))+f_(1,m) satisfiesδ_(m)≦δ*+η_(m) +f _(1,m).

The proofs for Lemmas 5 and 6 are given in Appendix 14.3. One has thechoice whether to be bounding the number of steps such that q_(1,k)^(adj) first exceeds x* or such that the slightly smaller valueq_(1,k)−f_(1,k) first exceeds x*. The latter provides the slightlystronger conclusion that δ_(k)≦1−x*. Either way, at the penultimate stepq_(1,k) ^(adj) is at least x*, which is sufficient for the next stepm=k+1 to take us to a larger value of q_(1,m)* at least g_(L)(x*). Soeither formulation yields the stated conclusion.

Associated with the use of the factor (1−xν) there is the followingimproved conclusion, noting that GAP is necessarily larger than theminimum of gap(x).

Lemma 7.

Suppose that g_(L)(x)−x is at least gap(x)=(1−xν)GAP for 0≦x≦x* with apositive GAP. Again there is convergence of g_(1,k) to values at leastx*. Arrange positive η_(std) and m* with

${GAP} = {\eta_{std} + {\frac{\log\mspace{11mu}{1/\left( {1 - x^{*}} \right)}}{m^{*} - 1}.}}$Set η(x)=(1−xν)η_(std) and f≦(1−ν)GAP′ with GAP′=[log 1/(1−x*)]/(m*−1)and set η_(k)=η(x) at x=q_(1,k−1) ^(adj) and f_(1,k)≦ f. Then the numberof steps k=m−1 until zx_(k) first exceeds x* is not more than m*−1.Again at step m the δ_(m)=(1−q_(1,m))+f_(1,m) satisfies δ_(m)≦δ*+η_(m)+f.

Demonstration of Lemma 7:

One hasq _(1,k) =g _(L)(q _(1,k−1) ^(adj))−η(q _(1,k−1) ^(adj))at leastq _(1,k−1) ^(adj)+(1−q _(1,k−1) ^(adj)ν)(GAP−η _(std)).Subtracting f as a bound on f_(1,k), it yieldsq _(1,k) ^(adj) ≧q _(1,k−1) ^(adj)+(1−q _(1,k−1) ^(adj))νGAP′.This implies, with x_(k)=g_(1,k) ^(adj) and ε=νGAP′, thatx _(k)≧(1−ε)X _(k−1)+εor equivalently,(1−x _(k))≦(1−ε)(1−x _(k−1)),as long as X_(k−1)≦x*. Accordingly for such k, there is the exponentialbound(1−x _(k))≦(1−ε)^(k) ≦e ^(−εk) =e ^(−νGAP′k)and the number of steps k=m−1 until x_(k) first exceeds x* satisfies

${m - 1} \leq \frac{\log\mspace{11mu}{1/\left( {1 - x^{*}} \right)}}{\log\mspace{11mu}{1/\left( {1 - \varepsilon} \right)}} \leq {\frac{\log\mspace{11mu}{1/\left( {1 - x^{*}} \right)}}{v\mspace{11mu}{GAP}^{\prime}}.}$This bound is mA*−1. The final step takes q_(1,m)* to a value at leastg_(L)(x*) so δ_(m)≦δ*+η_(m)+f_(1,m). This completes the demonstration ofLemma 7.

The idea here is that by extracting the factor (1−xν), which is small ifx and ν are near 1, it follows that a value GAP with larger constituentsη_(std) and GAP′ can be extracted than the previous constant gap, thoughto do so one pays the price of the log 1/(1-x*) factor.

Concerning the choice of f_(1,k), consider setting f_(1,k)= f for all kfrom 1 to m. This constant k_(1,k)= f remains bigger than f_(1,k)*=kf*with minimum ratio f/ f* at least ρ>1. To give a reason for choosing aconstant false alarm bound, note that with f_(1,k) equal to f_(1,m)= f,it is greater than f_(1,m)*= f*, which exceeds f_(l,k)* for k<m.Accordingly, the relative entropy exponent (B−1)D(p_(1,k)∥p_(1,k)*) thatarises in the probability bound in the next section is smallest at k=m,where it is at least f

(ρ)/ρ, where

(ρ) is the positive value ρ log ρ−(ρ−1).

In contrast, one has the seemingly natural choice f_(1,k)=kf of lineargrowth in the false alarm bound, with f=f*ρ. It is also upper bounded byf for k≦m and has constant ratio f_(1,k)/f_(1,k)* equal to ρ. It yieldsa corresponding exponent of kf

(ρ)/ρ for k=1 to m. However, this exponent has a value at k=1 that canbe seen to be smaller by a factor of order 1/m. For the same final falsealarm control, it is preferable to arrange the larger order exponent, bykeeping D(p_(1,k)∥p_(1,k)*) at least its value at k=m.

7 Reliability of Adaptive Successive Decoding

Herein it is established, for any power allocation and rate for whichthe decoder is accumulative, the reliability with which the weightedfractions of mistakes are governed by the studied quantities 1−q_(1,m)plus f_(1,m). The bounds on the probabilities with which the fractionsof mistakes are worse than such targets are exponentially small in L.The implication is that if the power assignment and the communicationrate are such that the function q_(L) is accumulative on [0,x*], thenfor a suitable number of steps, the tail probability for weightedfraction of mistakes more than δ*=1−q_(L)(x*) is exponentially small inL.

7.1 Reliability Using the Data-Driven Weights:

In this subsection reliability is demonstrated using the data-drivenweights {circumflex over (λ)} in forming the statistic

_(k,j) ^(comb). Subsection 7.2 discusses a slightly different approachwhich uses deterministic weights and provides slightly smaller errorprobability bounds.

Theorem 8.

Reliable communication by sparse superposition codes with adaptivesuccessive decoding. With total false alarm rate targetsf_(1,k)>f_(1,k)* and update function g_(L), set recursively thedetection rate targets q_(1,k)=g_(L)(q_(1,k−1) ^(adj))−η_(k), withη_(k)=q_(1,k)*−q_(1,k)>0 set such that it yields an increasing sequenceq_(1,k) for steps 1≦k≦m. Consider {circumflex over (δ)}_(m), theweighted failed detection rate plus false alarm rate. Then the m stepadaptive successive decoder incurs {circumflex over (δ)}_(m) less thanδ_(m)=(1−q_(1,m))+f_(1,m), except in an event of probability with upperbound as follows:

$\begin{matrix}{{{\sum\limits_{k = 1}^{m}\;\left\lbrack {\mathbb{e}}^{{{- L_{\pi}}{D{({{q_{1,k}{q_{1,k}^{*})}} + {{({k - 1})}\log\mspace{11mu}\overset{\sim}{L}}})}}} + {c_{0}k}} \right\rbrack} + \;{\sum\limits_{k = 1}^{m}\;\left\lbrack {\mathbb{e}}^{{- {L_{\pi}{({B - 1})}}}{D({{p_{1,k}{p_{1,k}^{*})}} + {{({k - 1})}\log\mspace{11mu}\overset{\sim}{L}}}}} \right\rbrack} + {\sum\limits_{k = 1}^{m}\;{\mathbb{e}}^{{- {({n - k + 1})}}D_{h_{k}}}}},} & \left. I \right)\end{matrix}$

where the terms correspond to tail probabilities concerning,respectively, the fractions of correct detections, the fractions offalse alarms, and the tail probabilities for the events {∥G∥_(k) ²/σ_(k)²≦n(1−h)}, on steps 1 to m. Here L_(π)=1/max_(j) π_(j). Thep_(1,k)p_(1,k)* equal the corresponding f_(1,k), f_(1,k)* divided byB−1. Also D_(h)=−log(1−h)−h is at least, h²/2. Hereh_(k)=(nh−k+1)/(m−k+1), so the exponent (n−k+1)D_(h) _(k) is nearnD_(h), as long as k/n is small compared to h.

II) A Refined Probability Bound Holds as in I Above but with Exponent

$L\frac{\eta_{k}^{2}}{V_{k} + {\left( {1/3} \right){\eta_{k}\left( {L/L_{\pi}} \right)}}}$

in place of L_(π)D(q_(1,k)∥q_(1,k)*) for each k=1, 2, . . . , m.

Corollary 9.

Suppose the rate and power assignments of the adaptive successive codeare such that g_(L) is accumulative on [0,x*] with a positive constantgap and a small shortfall δ*=1−g_(L)(x*). Assign positive η_(k)=η andf_(1,k)= f and m≧2 with 1−q_(1,m)≦δ*+η. Let

(ρ)=ρ log ρ−(ρ−1). Then there is a simplified probability bound. With anumber of steps in, the weighted failed detection rate plus false alarmrate is less than δ*+η+ f, except in an event of probability not morethan.me ^(−2L) ^(η) ² ^(+m[c) ⁰ ^(+log {circumflex over (L)}]) +me ^(−L) ^(π)

^(/ρ+m log {tilde over (L)})+me^(−(n−m+1)h) ^(m) ² ^(/2).

The bound in the corollary is exponentially small in 2L_(π)η² if h ischosen such that (n−m+1)h_(m) ²/2 is at least 2L_(π)η² and ρ>1 and f arechosen such that f[log ρ−1+1/ρ] matches 2η².

Improvement is possible using II, in which case it is found that V_(k)is of order 1/√{square root over (log B)}. This produces a probabilitybound exponentially small in Lη²(log B)^(1/2) for small η.

Demonstration of Theorem 8 and its Corollary:

False alarms occur on step k, when there are terms j in other ∪J_(k) forwhich there is occurrence of the event

_(k,j), which is the same for such j in other as the event H_(w) _(acc)_(,k,j), as there is no shift of the statistics for j in other. Theweighted fraction of false alarms up to step k is {circumflex over(f)}₁+ . . . +{circumflex over (f)}_(k) with increments {circumflex over(f)}_(k)=Σ_(jεother∪J) _(k) π_(j)

. This increment excludes the terms in dec_(1,k−1) which are previouslydecoded. Nevertheless, introducing associated random variables for theseexcluded events (with the distribution discussed in the proof of Lemmas1 and 2), the sum may be regarded as the weighted fraction of the unionΣ_(jεother)π_(j)

.

Recall, as previously discussed, for all such j in other, the eventH_(w,k′,j) is the event that Z_(w,k′,j) ^(comb) exceeds τ, where foreach w=(1, w₂, w₃, . . . , w_(k)), the Z_(w,k′,j) ^(comb) are standardnormal random variables, independent across j in other. So the events∪_(k′=1) ^(k)H_(w,k′,j) are independent and equiprobable across such j.Let p_(1,k)* be their probability or an upper bound on it, and letp_(1,k)>p_(1,k)*. Then A_(f,k)={{circumflex over (f)}_(k)^(tot)≧f_(1,k)} is contained in the union over all possible w of theevents {{circumflex over (p)}_(w,1,k)≧p_(1,k)} where

${\hat{p}}_{w,1,k} = {\frac{1}{B - 1}{\sum\limits_{j \in {other}}\;{\pi_{j}{1_{\bigcup_{k^{\prime} = 1}^{k}H_{w,k^{\prime},j}}.}}}}$With the rounding of the acc_(k) to rationals of denominator {tilde over(L)}, the cardinality of the set of possible w is at most {tilde over(L)}^(k−1). Moreover, by Lemma 46 in the appendix, the probability ofthe events {{circumflex over (p)}_(w,1,k)≧p_(1,k} is less than e) ^(−L)^(π) ^((B−1)D(p) ^(1,k) ^(∥p) ^(1,k) ^(*)). So by the union bound theprobability {{circumflex over (f)}_(k) ^(tot)≧f_(1,k)} is less than({tilde over (L)})^(k−1) e ^(−L) ^(π) ^((B−1)D(p) ^(1,k) ^(∥p) ^(1,k)^(*)).

Likewise, investigate the weighted proportion of correct decodings{circumflex over (q)}_(m) ^(tot) and the associated values {circumflexover (q)}_(1,k)=Σ_(j sent)π_(j)

which are compared to the target values q_(1,k) at steps k=1 to m. Theevent {{circumflex over (q)}_(1,k)<q_(1,k)} is contained in

_(k) so when bounding its

probability, incurring a cost of a factor of e^(kc) ⁰ , one may switchto the simpler measure

.

Consider the event A=∪_(k=1) ^(m)A_(k), where A_(k) is the union of theevents {{circumflex over (q)}_(1,k)≦q_(1,k)}, {{circumflex over (f)}_(k)^(tot)≧f_(1,k)} and {χ_(n−k+1) ²/n<1−h}. This event A may be decomposedas the union for k from 1 to m of the disjoint events A_(k)∩_(k′=1)^(k−1)A_(k′) ^(c). The Chi-square event may be expressed asA_(h,k)={χ_(n−k+1) ²/(n−k+1)<1−h_(k)} which has the probability bound

𝕖^(−(n − k + 1)D_(h_(k))).So to bound the probability of A, it remains to bound for k from 1 to m,the probability of the eventA _(q,k) ={{circumflex over (q)} _(1,k) <q _(1,k) }∩A _(h,k)^(c)Ω_(k′=1) ^(k−1) A _(k′) ^(c).In this event, with the intersection of A_(k′) ^(c) for all k′<k and theintersection with the Chi-square event A_(h,k) ^(c), the statistic

_(k,j) ^(comb) exceeds the corresponding approximation√{square root over (s_(k))}√{square root over (C_(j,R,B,h))}1_(j sent)+Z _(ŵ) _(acc) _(,k,j) ^(comb),where s_(k)=1/[1−q_(1,k−1) ^(adj)ν]. There is a finite set of possibleŵ^(acc) associated with the grid of values of acc₁, . . . , acc_(k−1)rounded to rationals of denominator {tilde over (L)}. Now A_(q,k) iscontained in the union across possible w of the events

{q̂_(w, 1, k) < q_(1, k)} where${\hat{q}}_{w,1,k} = {\sum\limits_{j\mspace{14mu}{sent}}\;{\pi_{j}{1_{\{{Z_{w,k,j}^{comb} \geq a_{k,j}}\}}.}}}$Here a_(k,j)=τ−√{square root over (s_(k))}√{square root over(C_(j,R,B,h))}. With respect to

, these z_(w,k,j) ^(comb) are standard normal, independent across j, sothe Bernoulli random variables 1_({Z) _(w,k,j) _(comb) _(≧a) _(k,j) _(})have success probability Φ(a_(k,j)) and accordingly, with respect to

, the {circumflex over (q)}_(w,1,k) has expectationq_(1,k)*=Σ_(j sent)π_(j) Φ(a_(k,j)). Thus, again by Lemma 46 in theappendix the probability of

{{circumflex over (q)}_(w,1,k) <q _(1,k)}is not more thane ^(−L) ^(π) ^(D(q) ^(1,k) ^(∥q) ^(1,k) ^(*)).By the union bound multiply this by ({tilde over (L)})^(k−1) to bound

(A_(q,k)). One may sum it across k to bound the probability of theunion.

The Chi-square random variables and the normal statistics for j in otherhave the same distribution with respect to

and

so there is no need to multiply by the e^(c) ⁰ ^(k) factor for the A_(h) and A_(f) contributions.

The event of interestA _(q) _(m) _(tot) ={{circumflex over (q)} _(m) ^(tot) ≦q _(1,m)}is contained in the union of the event A_(q) _(m) _(tot) ∩A_(q,m−1)^(c)∩A_(f) ^(c)∩A_(h) ^(c) with the events A_(q,m−1), A_(h) and A_(f),where A_(h)=∪_(k=1) ^(m)A_(h,k) and A_(f)=∪_(k=1) ^(m)A_(f,k). The threeevents A_(q,m−1), A_(h) and A_(f) are clearly part of the event A whichhas been shown to have the indicated exponential bound on itsprobability. This leaves us with the eventA _(q) _(m) _(tot) ∩A _(q,m−1) ^(c) ∩A _(f) ^(c) ∩A _(h) ^(c)Now, as has been seen earlier herein, {circumflex over (q)}_(m) ^(tot)may be regarded as the weighted proportion of occurrence the union∪_(k=1) ^(m)

_(k,j) which is at least Σ_(j sent)π_(j)

. Outside the exception sets A_(h), A_(f) and A_(q,m−1), it is at least{circumflex over (q)}_(1,m)=Σ_(j sent)π_(j)

. With the indicated intersections, the above event is contained inA_(q,m)={{circumflex over (q)}_(1,m)≦q_(1,m)}, which is also part of theevent A. So by containment in a union of events for which we have theprobability bounds, the indicated bound holds.

As a consequence of the above conclusion, outside the event A at stepk=m, one has {circumflex over (q)}_(m) ^(tot)<q_(1,m). Thus outside Athe weighted fraction of failed detections, which is not more than1−{circumflex over (q)}_(1,m), is less than 1−q_(1,m). Also outside A,the weighted fraction of false alarms is less than f_(1,m). So the totalweighted fraction of mistakes {circumflex over (δ)}_(m) is less thanδ_(m)=(1−q_(1,m))+f_(1,m).

In these probability bounds the role in the exponent of D(q∥q*) fornumbers q and q* in [0, 1], is played the relative entropy between theBernoulli(q) and the Bernoulli q* distributions, even though these q andq* arise as expectations of weighted sums of many independent Bernoullirandom variables.

Concerning the simplified bounds in the corollary, by thePinsker-Csiszar-Kulback-Kemperman inequality, specialized to Bernoullidistributions, the expressions of the form D(q∥q*) in the above, exceed2(q−q*)². This specialization gives rise to the e^(−2L) ^(π) ^(η) ²bound when the q_(1,k) and {tilde over (q)}_(1,k) differ from q_(1,k)*by the amount η.

The e^(−2L) ^(π) ^(η) ² bound arises alternatively by applyingHoeffding's inequality for sums of bounded independent random variablesto the weighted combinations of Bernoulli random variables that arisewith respect to the distribution

. As an aside, it is remarked that order η² is the propercharacterization of D(q∥q*) only for the middle region of steps whenq_(1,k)* is neither near 0 nor near 1. There are larger exponents towardthe ends of the interval (0, 1) because Bernoulli random variables haveless variability there.

To handle the exponents (B−1)D(p∥p*) at the small valuesp=p_(1,k)=f_(1,k)/(B−1) and p*=p_(1,k)*=f_(1,k)*(B−1), use the Poissonlower bound on the Bernoulli relative entropy, shown in the appendix.This produces the lower bound (B−1)[p_(1,k) logp_(1,k)/p_(1,k)*+p_(1,k)*−p_(1,k)] which is equal tof _(1,k) log f _(1,k) /f _(1,k) *+f _(1,k) *−f _(1,k).Write this value as f_(1,k)

(ρ_(k)) or equivalently f_(1,k)

(ρ_(k))/ρ_(k) where the functions

(ρ) and

(ρ)/ρ=log ρ+1−1/ρ are increasing in ρ≧1.

If one used f_(1,k)=kf and f_(1,k)*=kf* in fixed ratio ρ=f/f*, thislower bound on the exponent would be kf

(ρ)/ρ as small as f

(ρ)/ρ. Instead, keeping f_(1,k) locked at f, which is at least f*ρ, andkeeping f_(1,k)*=kf* less than or equal to mf*= f*, the ratio ρ_(k) willbe at least ρ and the exponents will be at least as large as f

(ρ)/ρ.

Finally, there is the matter of the refined exponent in II. As aboveproof the heart of the matter is the consideration of the probability

{{circumflex over (q)}_(w,1,k)<q_(1,k)}. Fix a value of k between 1 andm. Recall that {circumflex over (q)}_(w,1,k)=Σ_(j sent)π_(j)1_(H)_(w,k,j) . Bound the probability of the event that the stun of theindependent random variables ξ_(j)=−π_(j)(1_(H) _(w,k,j) − Φ _(j))exceeds η, where Φ _(j)= Φ(shift_(k,j)−τ)=

(H_(w,k,j)) provides the centering so that the ξ_(j) have mean 0.Recognize that Φ _(j) is Φ(μ_(j)(x))=1−Φ(μ_(j)(x)), evaluated atx=q_(1,k) ^(adj), and it is the same as used in the evaluation of theq_(1,k)*, the expected value of {circumflex over (q)}_(w,1,k), which isg_(L)(x). The random variables ε_(j) have magnitude bounded bymax_(j)π_(j)=1/L_(π) and variance υ_(j)=π_(j) ²Φ_(j)(1−Φ_(j)). Thusbound

{{circumflex over (q)}_(w,1,k)<q_(1,k)} by Bernstein's inequality, wherethe stuns are understood to be for j in sent,

${{{\mathbb{Q}}\left\{ {{\sum\limits_{j}\;\xi_{j}} \geq \eta} \right\}} \leq {\exp\left\{ {- \frac{\eta^{2}}{2\left\lbrack {{V/L} + {\eta/\left( {3L_{\pi}} \right)}} \right\rbrack}} \right\}}},$where here η=η_(k) is the difference between the mean q_(1,k)* andq_(1,k) and V/L=Σ_(j)υ_(j)=Σ_(j)π_(j) ²Φ_(j)(1−Φ_(j)) is the totalvariance. It is V_(k)/L given by

${{V(x)}/L} = {\sum\limits_{j}\;{\pi_{j}^{2}{\Phi\left( {\mu_{j}(x)} \right)}\left( {1 - {\Phi\left( {\mu_{j}(x)} \right)}} \right.}}$evaluated at g_(1,k−1) ^(adj). This completes the demonstration ofTheorem 8.

If one were to use the crude bound on the total variance of(max_(j)π_(j))Σ_(j)π_(j)¼=1/(4L_(π)) the result in II would be no betterthan the exp{−2L_(π)η²} bound that arises from the Hoeffding bound.

The variable power assignments to be studied arrange Φ_(j)(1−Φ_(j)) tobe small for most j in sent. Indeed, a comparison of the stun V(x)/L toan integral, in a manner similar to the analysis of g_(L)(x) in anupcoming section, shows that V(x) is not more than a constant times 1/τ,which is of order 1/√{square root over (log B)}, by the calculation inAppendix 14.7. This produces, with a positive constant const, a bound ofthe formexp{−constL min {η,η²√{square root over (log B)}}}.Equivalently, in terms of n=(L log B)/R the exponent is at least aconstant times n min{η²/√{square root over (log B)},η/log}. Thisexponential bound is an improvement on the other bounds in the Theorem8, by a factor of √{square root over (log B)} in the exponent for arange of values of η up to 1/√{square root over (log B)}, provided ofcourse that η<gap to permit the required increase in q_(1,k). For thebest rates obtained here, η will need to be of order 1/log B, to withina log log factor, matching the order of

−R. So this improvement brings the exponent to within a √{square rootover (log B)} factor of best possible.

Other bounds on the total variance are evident. For instance, fromΣ_(j)π_(j)Φ_(j)(1−Φ_(j)) less than both Σ_(j)π_(j)Φ_(j) andΣ_(j)π_(j)(1−Φ_(j))}, it follows thatV(x)/L≦(1/L _(π))min{g _(L)(x),1−g _(L)(x)}.This reveals that there is considerable improvement in the exponentsprovided by the Bernstein bound for the early and later steps whereg_(L)(x) is near 0 or 1, even improving the order of the bounds there.This does not alter the fact that the decoder must experience the effectof the exponents for steps with x near the middle of the interval from 0to 1, where the previously mentioned bound on V(x) produces an exponentof order Lη²√{square root over (log B)}.

For the above, data-driven weights λ are used, with which the errorprobability in a union bound had to be multiplied by a factor of {tildeover (L)}^(k−1), for each step k, to account for the size of the set ofpossible weight vectors.

Below a slight modification to the above procedure is described usingdeterministic λ that does away with this factor, thus demonstratingincreased reliability for given rates below capacity. The procedureinvolves choosing each dec_(k) to be a subset of the terms abovethreshold, with the π weighted size of this set very near apre-specified value pace_(k).

7.2 An Alternative Approach:

As mentioned earlier, instead of making dec_(k), the set of decodedterms for step k, to be equal to thresh_(k), one may take dec_(k) foreach step to be a subset of thresh_(k) so that its size accept_(k) isnear a deterministic quantity which called pace_(k). This will yield asum accept_(k) ^(tot) near Σ_(k′=1) ^(k)pace_(k′) which is arranged tomatch q_(1,k). Again abbreviate accept_(k) ^(tot) as acc_(k) ^(tot) andaccept_(k) as acc_(k).

In particular, setting pace_(k)=q_(1,k) ^(adj)−q_(1,k−1) ^(adj), the setdec_(k) is chosen by selecting terms in J_(k) that are above threshold,in decreasing order of their

_(k,j) ^(comb) values, until for each k the accumulated amount nearlyequals q_(1,k). In particular given acc_(k−1) ^(tot), one continues toadd terms to acc_(k), if possible, until their sum satisfies thefollowing requirement,q _(1,k) ^(adj)−1/L _(π) <acc _(k) ^(tot) ≦q _(1,k) ^(adj),where recall that 1/L_(π) is the minimum weight among all j in J. It isa small term of order 1/L.

Of course the set of terms thresh_(k) might not be large enough toarrange for accept_(k) satisfying the above requirement. Nevertheless,it is satisfied, provided

${{acc}_{k - 1}^{tot} + {\sum\limits_{j \in {thresh}_{k}}\;\pi_{j}}} \geq q_{1,k}^{adj}$or equivalently,

${{\sum\limits_{j \in {dec}_{1,{k - 1}}}\;\pi_{j}} + {\sum\limits_{j \in {J - {dec}_{1,{k - 1}}}}\;{\pi_{j}1_{\mathcal{H}_{k,j}}}}} \geq {q_{1,k}^{adj}.}$Here for convenience take dec₀=dec_(1,0) as the empty set.

To demonstrate satisfaction of this condition note that the left side isat least the value one has if the indicator

is imposed for each j and if the one restricts to j in sent, which isthe value {circumflex over (q)}_(1,k) ^(above)=Σ_(jεsent)

. Analysis for this case demonstrates, for each k, that the inequality{circumflex over (q)} _(1,k) ^(above) >q _(1,k)holds with high probability, which in turn exceeds q_(1,k) ^(adj). Sothen the above requirement is satisfied for each step, with highprobability, and thence acc_(k) matches pacc_(k) to within 1/L_(π).

This {circumflex over (q)}_(1,k) ^(above) corresponds to the quantitystudied in the previous section, giving the weighted total of terms insent for which the combined statistic is above threshold, and it remainslikely that it exceeds the purified statistic {circumflex over(q)}_(1,k). What is different is the control on the size of thepreviously decoded sets allows for constant weights of combination.

In the previous procedure random weights ŵ_(k) ^(acc) were employed inthe assignment of the λ_(1,k), λ_(2,k), . . . , λ_(k,k) used in thedefinition of

_(k,j) ^(comb) and Z_(k,j) ^(comb), where recall that ŵ_(k)^(acc)=1/(1−acc_(k−1) ^(tot)ν)−1/(1−acc_(k−2) ^(tot)ν). Here, since eachacc_(j) ^(tot) is near a deterministic quantity, namely q_(1,k) ^(adj),replace ŵ_(k) ^(acc) by a deterministic quantity w_(k)* given by,

${w_{k}^{*} = {\frac{1}{\left( {1 - {q_{1,{k - 1}}^{adj}v}} \right)} - \frac{1}{\left( {1 - {q_{1,{k - 2}}^{adj}v}} \right)}}},$and use the corresponding vector λ* with coordinatesλ_(k′,k)*=w_(k′)*/[1+w₂*+ . . . +w_(k)*] for k′=1 to k.

Earlier the inequality ŵ_(k)≦ŵ_(k) ^(acc) was demonstrated, whichallowed quantification of the shift factor in each step. Analogously,the following result is obtained for the current procedure usingdeterministic weights.

Lemma 10.

For k′<k, assume the decoding sets dec_(1,k′) are arranged so that thecorresponding acc_(k′) ^(tot) takes value in the interval (q_(1,k′)^(adj)−1/L_(π),q_(1,k′) ^(adh)]. Thenŵ _(k) ≦w _(k)*+ε₁,where ε₁=ν/(L_(π)(1−ν)²)=snr(1+snr)/L_(π) is a small term of order 1/L.Likewise. ŵ_(k′)≦w_(k)*+ε₂ holds for k′<k as well.

Demonstration of Lemma 10:

The {circumflex over (q)}_(k′) and {circumflex over (f)}_(k′) are theweighted sizes of the sets of true terms and false alarms, respectively,retaining that which is actually decoded on step k′, not merely abovethreshold. These have sum {circumflex over (δ)}_(k′)+{circumflex over(f)}_(k′)=acc_(k′), nearly equal to pace_(k′), taken here to be q_(1,k′)^(adj)−q_(1,k′−1) ^(adj). Let's establish the inequalities{circumflex over (q)} ₁ ^(adj)+ . . . +{circumflex over (q)}_(k−1)^(adj) ≦q _(1,k−1) ^(adj)and{circumflex over (q)} _(k−1) ^(adj) ≦q _(1,k−1) ^(adj) −q _(1,k−2)^(adj)+1/L _(π).The first inequality uses that each {circumflex over (q)}_(k′) ^(adj) isnot more than {circumflex over (q)}_(k), which is not more than{circumflex over (q)}_(k′)+{circumflex over (f)}_(k′), equal to acc_(k′)which sums to acc^(k−1) ^(tot) not more than q_(1,k−1) ^(adj). Thesecond inequality is a consequence of the fact that {circumflex over(q)}_(k−1) ^(adj)≦acc_(k−1) ^(tot)−acc_(k−2) ^(tot). Using the bounds onacc_(k−1) ^(tot) and acc_(k−2) ^(tot) gives that claimed inequality.

These two inequalities yield

${\hat{w}}_{k} \leq {\frac{\left( {q_{1,{k - 1}}^{adj} - q_{1,{k - 2}}^{adj} + {1/L_{\pi}}} \right)v}{\left( {1 - {q_{1,{k - 1}}^{adj}v}} \right)\left( {1 - {q_{1,{k - 2}}^{adj}v}} \right)}.}$The right side can be written as,

$\frac{1}{1 - {q_{1,{k - 1}}^{adj}v}} - \frac{1}{1 - {q_{1,{k - 2}}^{adj}v}} + {\frac{{1/L_{\pi}}v}{\left( {1 - {q_{1,{k - 1}}^{adj}v}} \right)\left( {1 - {q_{1,{k - 2}}^{adj}v}} \right)}.}$Now bound the last term using q_(1,k−1) ^(adj) and q_(1,k−2) ^(adj) lessthan 1 to complete the demonstration of Lemma 10.

Define the exception set A_(q,above)=∪_(k′=1) ^(k−1){{circumflex over(q)}_(1,k′) ^(above)<q_(1,k′)}. In some expressions above is abbreviatedas abv. Also recall the set A_(f)=∪k′=1 ^(k−1){{circumflex over(f)}_(k′) ^(tot)>f_(1,k′)}. For convenience suppress the dependence on kin these sets.

Outside of A_(q,adv), the {circumflex over (q)}_(1,k′) ^(abv) at leastq_(1,k′) and hence at least q_(1,k) ^(adj) for each 1≦k′<k, ensuringthat for each such k′ one can get decoding sets dec_(k′) such that thecorresponding acc_(k′) ^(tot) is at most 1/L_(π) below q_(1,k′) ^(adj).Thus the requirements of Lemma 10 are satisfied outside this set.

Now proceed to lower bound the shift factor for step k outside ofA_(q,abv)∪A_(f).

For the above choice of λ=λ* the shift factor is equal to the ratio

$\frac{1 + \sqrt{{\hat{w}}_{2}w_{2}^{*}} + \ldots + \sqrt{{\hat{w}}_{k}w_{k}^{*}}}{\sqrt{1 + w_{2}^{*} + \ldots + w_{k}^{*}}}.$Using the above lemma and the fact that √{square root over(a−b)}≧√{square root over (a)}−√{square root over (b)}, obtain that theabove is greater than or equal to

$\frac{1 + {\hat{w}}_{2} + \ldots + {\hat{w}}_{k}}{\sqrt{1 + w_{2}^{*} + \ldots + w_{k}^{*}}} - {\sqrt{\varepsilon_{1}}{\frac{\sqrt{{\hat{w}}_{2}} + \ldots + \sqrt{{\hat{w}}_{k}}}{\sqrt{1 + w_{2}^{*} + \ldots + w_{k}^{*}}}.}}$Now use the fact that√{square root over (ŵ₂)}+ . . . +√{square root over (ŵ_(k))}≦√{squareroot over (k)}√{square root over (ŵ₂+ . . . +ŵ_(k))}to bound the second term by ε₂=√{square root over (ε₁)}√{square rootover (k)}√{square root over (ν/(1−ν))} which is snr√{square root over((1−snr)k/L_(π))}, a term of order near 1/√{square root over (L)}. Hencethe shift factor is at least,

$\frac{1 + {\hat{w}}_{2} + \ldots + {\hat{w}}_{k}}{\sqrt{1 + w_{2} + \ldots + w_{k}}} - {\varepsilon_{2}.}$Consequently, it is at least

$\frac{\sqrt{1 - {q_{1,{k - 1}}^{adj}v}}}{1 - {{\hat{q}}_{k - 1}^{{tot},{adj}}v}} - {\varepsilon_{2}.}$where recall that {circumflex over (q)}_(k−1) ^(tot,adhj)={circumflexover (q)}_(k−1) ^(tot)/(1+{circumflex over (f)}_(k−1) ^(tot)/{circumflexover (q)}_(k−1) ^(tot)). Here it is used that 1+ŵ₂+ . . . +ŵ_(k), whichis 1/(1−{circumflex over (q)}_(k−1) ^(adj,tot)ν), can be bounded frombelow by 1/(1−{circumflex over (q)}_(k−1) ^(tot,adj)ν) using Lemma 4.

Similar to before, note that q_(1,k−1) ^(adj) and q_(k−1) ^(tot,adj) areclose to each other when the false alarm effects are small. Hence writethis shift factor in the form

$\sqrt{\frac{1 - {\hat{h}}_{f,{k - 1}}}{1 - {{\hat{q}}_{k - 1}^{{tot},{adj}}v}}}$as before. Again find thatĥ _(f,k−1)≦{circumflex over (2)}f _(k−1) ^(tot) snr+ε ₃outside of the exception set A_(q,abv). Here

${c_{3} = {\frac{snr}{L_{\pi}} + {2\; c_{2}}}},$is a term of order 1/√{square root over (L)}.

To confirm the above use the inequality √{square root over(1−a)}−√{square root over (b)}≧√{square root over (1−c)}, wherec=a+2√{square root over (b)}. Here a=(q_(1,k−1)−{circumflex over(q)}_(k−1) ^(tot,adj))ν/(1−{circumflex over (q)}_(k−1) ^(tot,adj)ν) andb=ε₂ ²(1−q_(k−1) ^(tot,adj)ν). Noting that the numerator in a is at most(1/L_(π)+2{circumflex over (f)}_(k−1) ^(tot)−({circumflex over(f)}_(k−1) ^(tot))²)ν outside of A_(q,abv) and that 0≦q_(k−)^(tot,adj)≦1, one obtains the bound for ĥ_(f,k−1).

Next, recall outside of the exception set A_(f)∪A_(q,abv) that{circumflex over (q)}_(k−1) ^(tot)≧q_(1,k−1) and {circumflex over(f)}_(k−1) ^(tot)≦f_(1,k−1). This leads to the shift factor being atleast

$\sqrt{\frac{1 - h_{f,{k - 1}}}{1 - {q_{1,{k - 1}}^{adj}v}}},{where}$h_(f, k) = 2 f_(1, k)snr + ε₃.As before, assume a bound f_(1,k)≦ f, so that h_(f,k) is not more thanh_(f)=2 fsnr+ε₃, independent of k.

As done herein previously, create the combined statistics

_(k,j) ^(comb), now using the deterministic λ*. For j in other this

_(k,j) ^(comb) equals Z_(k,j) ^(comb) and for j in sent, when outsidethe exception set A_(abv)=A_(q,abv)∪A_(f)∪A_(h), this combinationexceeds

${{\sqrt{\frac{1 - h^{\prime}}{1 - {q_{1,{k - 1}}^{adj}v}}}\sqrt{C_{j,R,B}}1_{j\mspace{14mu}{sent}}} + Z_{k,j}^{comb}},$where (1−h′)=(1−h)(1−j_(f)) as before, though with h_(f) larger by thesmall amount ε₃. Again obtain shift_(k,j)=√{square root over(C_(j,R,B,h/()1−xν))} evaluated at x=q_(1,k−1) ^(adj), with C_(j,R,B,h)as before.

Analogous to Theorem 8, reliability after m steps of the decoder isdemonstrated by bounding the probability of the exception set A=∪_(k−1)^(m)A_(k), where A_(k) is the union of the events {{circumflex over(q)}_(1,k) ^(abv)≦q_(1,k)}, {{circumflex over (k)}_(k) ^(tot)≧f_(1,k)}and {χ_(n−k+1) ²/n<1−h}. Thus the proof of Theorem 8 carries over, onlynow it is not required to take the union over the grid of values of theweights. The analogous theorem with the resulting improved bounds is nowstated.

Theorem 11.

Under the same assumptions as in Theorem 8, the m step adaptivesuccessive decoder, using deterministic pacing withpace_(k)=q_(1,k)−q_(1,k−1), incurs a weighted fraction of errors{circumflex over (δ)}_(m) less than δ_(m)=f_(1,m)+(1−q_(1,m)), except inan event of probability not more than

${{\sum\limits_{k = 1}^{m}\;\left\lbrack {\mathbb{e}}^{{{{- L_{\pi}}{D{({q_{1,k}❘{❘q_{1,k}^{*}}})}}})} + {c_{0}k}} \right\rbrack} + {\sum\limits_{k = 1}^{m}\;\left\lbrack {\mathbb{e}}^{{- {L_{\pi}{({B - 1})}}}{D{({p_{1,k}❘{❘p_{1,k}^{*}}})}}} \right\rbrack} + {\sum\limits_{k = 1}^{m}\;{\mathbb{e}}^{{- {({n - k + 1})}}D_{h_{k}}}}},$where the bound also holds if the exponent L_(π)D(q_(1,k)∥q_(1,k)*) isreplaced by

$L{\frac{\eta_{k}^{2}}{V_{k} + {\left( {1/3} \right){\eta_{k}\left( {L/L_{\pi}} \right)}}}.}$In the constant gap bound case, with positive η and f and m≧2,satisfying the same hypotheses as in the previous corollary, theprobability of {circumflex over (δ)}_(m) greater than δ*+η+ f is notmore thanme ^(−2L) ^(π) ^(η) ² ^(+mc) ⁰ +

+me^(−(n−m+1)h) ^(m) ² ^(/2).Furthermore, using the variance V_(k) and allowing a variable gap boundgap_(k)≦g_(L)(x_(k))−x_(k) and 0<f_(1,k)+η_(k)<gap_(k), with differencegap′=gap_(k)−f_(1,k)+η_(k) and number of steps m≦1+1/gap′, and withρ_(k)=f_(1,k)/f_(1,k)*<1, this probability bound also holds with theexponent

$L\mspace{14mu}{\min\limits_{k}{\eta_{k}^{2}/\left\lbrack {V_{k} + {\left( {1/3} \right){\eta_{k}\left( {L/L_{\pi}} \right)}}} \right\rbrack}}$in place of 2L_(π)η² and with min_(k)f_(1,k)

(ρ_(k))/ρ_(k) in place of f

(ρ)/ρ, where the minima are taken over k from 1 to m.

The bounds are the same as in Theorem 9 and its corollary, except forimprovement due to the absence of the factors {tilde over (L)}^(k−1). Inthe same manner as discussed there, there are choices of f, ρ and h,such that the exponents for the false alarms and the chi-squarecontributions are at least as good as for the q_(1,k), so that the boundbecomes3me ^(−2L) ^(π) ^(η) ² ^(+mc) ⁰ .

It is remarked that for the particular variable power allocation rulestudied in the upcoming sections, as said, the update function g_(L)(x)will seen to be ultimately insensitive to L, with g_(L)(x)−x rapidlyapproaching a function g(x)−x at rate 1/L uniformly in x. Indeed, a gapbound for g_(L) will be seen to take a form gap_(L)=gap*−θ/L_(π) forsome constant θ, so that it approaches the value of the gap determinedby g, denoted gap*, where note that L and L_(π) agree to within aconstant factor. Accordingly, using gap*−θ/L_(π) in apportioning thevalues of η, f, and 1/(m−1), these values are likewise ultimatelyinsensitive to L. Indeed, slight adjustment to the rate allowsarrangement of a gap independent of L.

Nevertheless, to see if there be any effect on the exponent, suppose fora specified η* that η=η*−θ/L_(π) represents a corresponding reduction inη due to finite L. Consider the exponential bounde ^(−2L) ^(π) ^(η) ² .Expanding the square it is seen that the exponent L_(π)η², which isL_(π)(η*−θ/L_(π))², is at least L_(π)(η*)² minus a term 2θη* that isnegligible in comparison. Thus the approach of η to η* is sufficientlyrapid that the probability bound remains close to what it would be,e ^(−2L) ^(π) ^((ƒ*)) ² ,if one were to ignore the effect of the θ/L_(π), where it is used thatL_(π)(η*)² is large, and that η* is small, e.g., of the order of 1/logB.

8 Computational Illustrations

An important part of the invention herein is a device for evaluating theperformance of a decoder depending on the parameters of the design,including L, B, a snr, the choice of power allocations, and the amountthat the rate is below capacity. The heart of this device is thesuccessive evaluation of the update function g_(L)(x). Accordingly, theperformance of the decoder is illustrated. First, for fixed values ofthe design parameters and rates below capacity, evaluate the detectionrate as well as the probability of the exception set P_(ε) using thetheoretical bounds given in Theorem 11. Plots demonstrating theprogression of the decoder are also shown in specific figures. Thesehighlight the crucial role of the function g_(L) in achievingperformance objectives.

FIGS. 4, 5, and 6 presents the results of computation using thereliability bounds of Theorem 11 for fixed L and B and various choicesof snr and rates below capacity. The dots in these figures denotesq_(1,k) ^(adj) for each k and the step function joining these dotshighlight how q_(1,k) ^(adj) is computed from q_(1,k−1) ^(adj). Forlarge L these q_(1,k) ^(adj)'s would be near q_(1,k), the lower bound onthe proportion of sections decoded after k passes. In this extreme caseq_(1,k), would match g_(L)(q_(1,k−1)), so that the dots would lie on thefunction.

For illustrative purposes take B=2¹⁶, L=B and snr values of 1, 7 and 15,in these three figures. For each snr value the maximum rate, over a gridof values, is determined, for which there is a particular control on theerror probability. With snr=1 (FIG. 6), this rate R is 0.3 bits which is593/4 of capacity. When snr is 7 and 15 (FIGS. 4 and 5), respectively,these rates correspond to 49.5% and 43.5% of their correspondingcapacities.

Specifically, for FIG. 5, with snr=15, the variable power allocation wasused with P_((l)) proportional to e^(−2Cl/L) and a=1.3375. The weighted(unweighted) detection rate is 0.995 (0.983) for a failed detection rateof 0.017 and the false alarm rate is 0.006. The probability of mistakeslarger than these targets is bounded by 5.4×10⁻⁴.

For FIG. 6, with snr=1, constant power allocation was used for thesections with a=0.6625. The detection rate (both weighted andun-weighted) is 0.944 and the false alarm and failed detection rates are0.016 and 0.056 respectively, with the corresponding error probabilitybounded by 2.1×10⁻⁴.

Specifics for FIG. 4 with snr=7 were already discussed in theintroduction.

The error probability in these calculations is controlled as follows.Arrange each of the 3m terms in the probability bound to take the samevalue, set in these examples to be ε=10⁻⁵. In particular, compute insuccession appropriate values of q_(1,k)* and f_(1,k)*=kf*, using anevaluation of the function g_(L)(x), an L term sum, evaluated at a pointdetermined from the previous step, and from these determine q_(1,k) andf_(1,k).

This means solving for q_(1,k) less than q_(1,k)* such that e^(−L) ^(π)^(D(q) ^(1,k) ^(∥q) ^(1,k) ^(*)+c) ⁰ ^(k) equals ε, and withp_(1,k)*=f_(1,k)*/(B−1), solving for the p_(1,k) greater than p_(1,k)*such that the corresponding term e^(−L) ^(π) ^((B−1)D(p) ^(1,k) ^(∥p)^(1,k) ^(*)) also equals ε. In this way, the largest q_(1,k) less thanq_(1,k)* is used, that is, the smallest η_(k), and the smallest falsealarm bound f_(1,k), for which the respective contributions to the errorprobability bound is not worse then the prescribed value.

These are numerically simple to solve because D(q∥q*) is convex andmonotone in q<q*, and likewise for D(p∥p*) for p>p*. Likewise arrangeh_(k) so that e^(−(n−k+1)D) ^(h) ^(k) matches ε.

Taking advantage of the Bernstein bound sometimes yields a smaller η_(k)by solving for the choice satisfying the quadratic equation Lη_(k)²/[V_(k)+(⅓)η_(k)L/L_(π)]=log 1/ε+c₀k, where V_(k) is computed by anevaluation of V(x), which like q_(L)(x) is a L term sum, both of whichare evaluated at x=q_(1,k−1) ^(adj).

These computation steps continue as long as (1−q_(1,k))+f_(1,k)decreases, thus yielding the choice of the number of steps m.

For these computations choose power allocations proportional tomax{e ^(−2γ(l−1)/L) ,e ^(−2γ)(1+δ_(c))},with 0≦γ≦

. Here the choices of a, c and γ are made, by computational search, tominimize the resulting sum of false alarms and failed detections, perthe bounds. In the snr=1 case the optimum γ is 0, so there is constantpower allocation in this case. In the other two cases, there is variablepower across most of the sections. The role of a positive c being toincrease the relative power allocation for sections with low weights.Note, in the analytical results for maximum achievable rates as afunction of B as given in the upcoming sections, γ is constrained to beequal to

, though the methodology does extend to the case of γ<

.

FIGS. 7,8,9 give plots of achievable rates as a function of B for snrvalues of 15, 7 and 1. The section error rate is controlled to bebetween 9 and 10%. For the curve using simulation runs the rates areexhibited for which the empirical probability of making more than 10%section mistakes is near 10⁻³.

For each B, the points on the detailed envelope correspond to thenumerically evaluated maximum inner code rate for which the sectionmistake rate is between 9 and 10%. Here assume L to be large, so thatthe q_(1,k)'s and f_(k)'s are replaced by the expected values q_(1,k)*and f_(k)*, respectively. Also take h=0. This gives an idea about thebest possible rates for a given snr and section mistake rate.

For the simulation curve, L was fixed at 100 and for given snr. B andrate values, 10⁴ runs of our decoder were performed. The maximum rateover the grid of values satisfying section error rate of less than 10%except in 10 replicates, (corresponding to an estimated P_(ε) of 10⁻³)are shown in the plots. Interestingly, even for such small values of Lthe curve is quite close to the detailed envelope curve, showing thatthe theoretical bounds herein are quite conservative.

9 Accumulative g

This section complements the previous computational results andreliability results to analytically quantify, in the finite L and Bcase, conditions on the rate moderately close to capacity, such that theupdate function g_(L)(x) is indeed accumulative for a suitable positivegap and an x* near 1.

In particular, normalized power allocation weights π_((l)) are developedin subsection 1, including slight modification to the exponential form.An integral approximation g(x) to the sum g_(L)(x) is provided insubsection 2. Subsection 3 examines the behavior of g_(L)(x) for x near1, including introduction of x* via a parameter r₁ related to an amountof permitted rate drop and a parameter ζ related to the amount of shiftat x*. For cases with monotone decreasing g(x)−X₇ as in the unmodifiedweight case, the behavior for x near 1 suffices to demonstrate thatg_(L)(x) is accumulative. Improved closeness of the rate to capacity isshown in the finite codelength case by allowance of the modifications tothe weight via the parameter δ_(c). But with this modificationsmonotonicity is lost. In Subsection 4, a bound on the number ofoscillations of g(x)−x is established in that is used in showing thatg_(L)(x) is accumulative. The location of x* and the value of δ_(c) bothimpact the mistake rate δ_(mis) and the amount of rate drop required forg_(L)(x) to be accumulative, expressed through a quantity introducedthere denoted r_(crit). Subsection 5 provides optimization of δ_(c).Helpful inequalities in controlling the rate drop are in subsection 6.Subsection 7 provides optimization of the contribution to the total ratedrop of the choice of location of x*, via optimization of ζ.

Recall that g_(L)(x) for 0≦x≦1 is the function given by

${{g_{L}(x)} = {\sum\limits_{l = 1}^{L}\;{\pi_{(l)}{\Phi\left( {{\mu_{l}(x)} - \tau} \right)}}}},$where here denote μ_(l)(x)=shift_(l,x)=√{square root over(C_(l,R,B,h)/(1−xν))}. Recursively, q_(1,k) is obtained fromq_(1,k)*=g_(L)(x) evaluated at x=q_(1,k−1) ^(adj), in succession for kfrom 1 to m.

The values of Φ(μ_(l)(x)−τ), as a function of l from 1 to L, providewhat is interpreted as the probability with which the term sent fromsection l have approximate test statistic value that is above threshold,when the previous step successfully had an adjusted weighted fractionabove threshold equal to x. The Φ(μ_(l)(x)−τ) is increasing in xregardless of the choice of π_(l), though how high is reached depends onthe choice of this power allocation.

9.1 Variable Power Allocations:

Consider two closely related schemes for allocating the power. Firstsuppose P_((l)) is proportional to

as motivated in the introduction. Then the weight for section l isπ_((l)) given by P_((l))/P. In this case recall thatC_(l,R)=π_((l))Lν/(2R) simplifies to u_(l) times the constant

/R whereu _(l)=

for sections l from 1 to L. The presence of the factor

/R if at least 1, increases the value of g_(L)(x) above what it would beif that factor were not there and helps in establishing that it isaccumulative.

As l varies from 1 to L the ranges from 1 down to the value

=1−ν.

To roughly explain the behavior, as shall be see, this choice of powerallocation produces values of Φ(μ_(l)(x)−τ) that are near 1 for l withu_(l) enough less than 1−xν and near 0 for values of u_(l) enoughgreater than 1−xν, with a region of l in between, in which there will bea scatter of sections with statistics above threshold. Though it isroughly successful in reaching an x near 1, the fraction of detectionsis limited, if R is too close to

, by the fact that μ_(l)(x) is not large for a portion of l near theright end, of the order 1/√{square root over (2 log B)}.

Therefore, the power allocation is modified, taking π_((l)) to beproportional to an expression that is equal

$u_{l} = {\exp\left\{ {{- 2}\; C\frac{l - 1}{L}} \right\}}$except for large l/L where it is leveled to be not less than a valueu_(cut)=

(1+δ_(c)) which exceeds (1−ν)=

=1/(1+snr) using a small positive δ_(c). This δ_(c) is constrained to bebetween 0 and snr so that u_(cut) is not more than 1. Thus let π_((l))be proportional to ũ_(l) given bymax{u _(l) ,u _(cut)}.The idea is that by leveling the height to a slightly larger value forl/L near 1, nearly all sections are arranged to have ũ_(l) above (1−xν)when x is near 1. This will allow to reach the objective with an Rcloser to

. The required normalization could adversely affect the rate, but itwill be seen to be of a smaller order of 1/(2 log B).

To produce the normalized π_((l)=max{u) _(l), u_(cut)} (L SUM compute

${sum} = {\sum\limits_{l = 1}^{L}\;{\max\left\{ {u_{l},u_{cut}} \right\}{\left( {1/L} \right).}}}$If c=0 this sum equals ν/(2

) as previously seen. If c>0 and u_(cut)<1, it is the stun of two parts,depending on whether

is greater than or not greater than u_(cut). This sum can be computedexactly, but to produce a simplified expression let's note thatreplacing the sum by the corresponding integralinteg=∫₀ ¹max{

,u_(cut) }dtan error of at most 1/L is incurred. For each L there is a θ with 0≦θ≦1such thatsum=integ+θ/L.In the integral,

to u_(cut) corresponds to comparing corresponds to comparing t tot_(cut) equal to [1/(2

)] log 1/u_(cut). Splitting the integral accordingly, it is seen toequal [1/(2

)](1−u_(cut)) plus u_(cut)(1−t_(cut)), which may be expressed as

${{integ} = {\frac{v}{2\; C}\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack}},$where D(δ)=(1+δ)log(1+δ)−δ. For δ≧0, the function D(δ) is not more thanδ²/2, which is a tight bound for small δ. This [1+D(δ_(c))/snr] factorin the normalization, represents a cost to us of introduction of theotherwise helpful δ_(c). Nevertheless, this remainder D(δ_(c))/snr issmall compared to δ_(c), when δ_(c) is small compared to the snr. Itmight appear that D(δ_(c))/snr could get large if snr were small, but,in fact, since δ_(c)≦snr the D(δ_(c))/snr remains less than snr/2.

Accordingly, from the above relationship to the integral, the sum may beexpressed as

${{sum} = {\frac{v}{2\; C}\left\lbrack {1 + \delta_{sum}^{2}} \right\rbrack}},$where δ_(sum) ² is equal to D(δ_(c))/snr+2θ

/(Lv), which is not more than δ_(c) ²/(2snr)+2

/(Lν). Thus

$\pi_{(l)} = {\frac{\max\left\{ {u_{l},u_{cut}} \right\}}{L\mspace{14mu}{sum}} = {\frac{2\; C}{Lv}{\frac{\max\left\{ {u_{l},u_{cut}} \right\}}{1 + \delta_{sum}^{2}}.}}}$In this case C_(l,R,B,h)=(π_(l)Lν/(2R))(1−h′)(2 log B) may be written

${C_{l,R,B,h} = {\max\left\{ {u_{l},u_{cut}} \right\}\frac{C\left( {1 - h^{\prime}} \right)}{R\left( {1 + \delta_{sum}^{2}} \right)}\left( {2\;\log\; B} \right)}},$or equivalently, using τ=√{square root over (2 log B)}(1+δ_(a)), this is

max {u_(l), u_(cut)}(C^(′)/R)τ², where$C^{\prime} = {\frac{C\left( {1 - h^{\prime}} \right)}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + \delta_{a}} \right)^{2}}.}$For small δ_(c), δ_(a), and h′ this is a value near the capacity

. As seen later, the best choices of these parameters make

less than capacity by an amount of order log log B/log B. When δ_(c)=0the

/(1+δ_(sum) ²) is what is previously herein called

and its closeness to capacity is controlled by δ_(sum) ²≦2

/(νL).

In contrast, if δ_(c) were taken to be the maximum permitted, which isδ_(c)=snr, then the power allocation would revert to the constantallocation rule, with an exact match of the integral and the sum, sothat 1+δ_(sum) ²=1+D(snr)/snr and the

/(1+δ_(sum) ²) simplifies to R₀=(½)snr/(1+snr), which, as said herein,is a rate target substantially inferior to

, unless the snr is small.

Now μ_(l)(x)−τ which is √{square root over (C_(l,R,B,h/()1−xν))}−τ maybe written as the functionμ(x,u)=(√{square root over (u/(1−xν))}−1)τevaluated at u=max{u_(l), u_(cut)}(

/R). For later reference note that the u_(l)(x) here and hence g_(L)(x)both depend on x and the rate R only through the quantity (1−xν)R/

.

Note also that μ(x, u) is of order τ and whether it is positive ornegative depends on whether or not u exceeds 1−xν in accordance with thediscussion above.

9.2 Formulation and Evaluation of the Integral g (x):

The function that updates the target fraction of correct decodings is

${g_{L}(x)} = {\sum\limits_{l = 1}^{L}\;{\pi_{(l)}{\Phi\left( {{\mu_{l}(x)} - \tau} \right)}}}$which, for the variable power allocation with allowance for leveling,takes the form

${\sum\limits_{l = 1}^{L}\;{\pi_{(l)}{\Phi\left( {\mu\left( {x,{\max\left\{ {u_{l},u_{cut}} \right\}{C^{\prime}/R}}} \right)} \right)}}},$with

$u_{l} = {{\mathbb{e}}^{{- 2}\; C\frac{l - 1}{L}}.}$From the above expression for π_((l)), this g_(L)(x) is equal to

$\frac{2\; C}{vL}{\sum\limits_{l = 1}^{L}\;{\frac{\max\left\{ {u_{l},u_{cut}} \right\}}{1 + \delta_{sum}^{2}}{{\Phi\left( {\mu\left( {x,{\max\left\{ {u_{l},u_{cut}} \right\}{C^{\prime}/R}}} \right)} \right)}.}}}$Recognize that this sum corresponds closely to an integral. In eachinterval

$\frac{l - 1}{L} \leq t < \frac{l}{L}$for l from 1 to L, one have

${\mathbb{e}}^{{- 2}\; C\frac{l - 1}{L}}$at least

Consequently, g_(L)(x) is greater than g_(num)(x)/(1+δ_(sum) ²) wherethe numerator g_(num)(x) is the integral

$\frac{2\; C}{v}{\int_{0}^{1}{\max\left\{ {{\mathbb{e}}^{{- 2}\;{Ct}},u_{cut}} \right\}{\Phi\left( \ {\mu\left( {x,{\max\left\{ {{\mathbb{e}}^{{- 2}\;{Ct}},u_{cut}} \right\}{C^{\prime}/R}}} \right)} \right)}{{\mathbb{d}t}.}}}$Accordingly, the quantity of interest g_(L)(x) has value at least(integ/sum)g(x) where

${g(x)} = {\frac{g_{num}(x)}{1 + {{D\left( \delta_{c} \right)}/{snr}}}.}$Using

$\frac{integ}{sum} = {1 - {\frac{2\theta\; C}{L\;\eta}\frac{1}{1 + \delta_{sum}^{2}}}}$and using that g_(L)(x)≦1 and hence g_(num)(x)/(1+δ_(sum) ²)≦1 itfollows thatg _(L)(x)≧g(x)−2

/(Lν).The g_(L)(x) and g(x) are increasing functions of x on [0, 1].

Let's provide further characterization and evaluation of the integralg_(num)(x) for the variable power allocation. Let z_(x)^(low)=μ(x,u_(cut)

/R) and z_(X) ^(max)=μ(x,

/R). These have z_(x) ^(low)≦z_(x) ^(max), with equality only in theconstant power case (where u_(cut)=1). For emphasis, write out thatz_(x)=z_(X) ^(low) takes the form

$z_{x} = {\left\lbrack {\frac{\sqrt{u_{cut}{C^{\prime}/R}}}{\sqrt{1 - {xv}}} - 1} \right\rbrack{\tau.}}$Set u_(x)=1−xν.

Lemma 12.

Integral evaluation. The g_(num)(x) for has a representation as theintegral with respect to the standard normal density φ(z) of thefunction that takes the value 1+D(δ_(c))/snr for z less than z_(x)^(low), takes the value

$x + {\frac{u_{x}}{v}\left( {1 - {\frac{R}{C^{\prime}}\left( {1 + {z/\tau}} \right)^{2}}} \right)}$for z between z_(x) ^(low) and z_(x) ^(max), and takes the value 0 for zgreater than z_(x) ^(max). This yields g_(num)(x) is equal to

${{\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack{\Phi\left( z_{x}^{low} \right)}} + {\left\lbrack {x + {\delta_{R}\frac{u_{x}}{v}}} \right\rbrack\left\lbrack {{\Phi\left( z_{x}^{\max} \right)} - {\Phi\left( z_{x}^{low} \right)}} \right\rbrack} + {\frac{2\; R}{C^{\prime}}\frac{u_{x}}{v}\frac{\left\lbrack {{\phi\left( z_{x}^{\max} \right)} - {\phi\left( z_{x}^{low} \right)}} \right\rbrack}{\tau}} + {\frac{R}{C^{\prime}}\frac{u_{x}}{v}\frac{\left\lbrack {{z_{x}^{\max}{\phi\left( z_{x}^{\max} \right)}} - {z_{x}^{low}{\phi\left( z_{x}^{low} \right)}}} \right\rbrack}{\tau^{2}}}},\mspace{20mu}{where}$$\mspace{20mu}{\delta_{R} = {1 - {{\frac{R}{C^{\prime}}\left\lbrack {1 + {1/\tau^{2}}} \right\rbrack}.}}}$This δ_(R) is non-negative if R≦

/(1+1/τ²).

In the constant power case, corresponding to u_(cut)=1, the conclusionis consistent with the simpler g(x)=Φ(z_(x)).

The integrand above has value near x+(1−R/

)u_(x)/ν, if z is not too far from 0. The heart of the matter foranalysis in this section is that this value is at least x for rates R≦

.

Demonstration of Lemma 12:

By definition, the function g_(num)(x) is

${\frac{2\; C}{v}{\int_{0}^{1}{\max\left\{ {{\mathbb{e}}^{{- 2}\;{Ct}},u_{cut}} \right\}{\Phi\left( \ {\mu\left( {x,{\max\left\{ {{\mathbb{e}}^{{- 2}\;{Ct}},u_{cut}} \right\}{C^{\prime}/R}}} \right)} \right)}{\mathbb{d}t}}}},$which is equal to the integral

$\frac{2\; C}{v}{\int_{0}^{t_{cut}}{{\mathbb{e}}^{{- 2}\;{Ct}}{\Phi\left( \ {\mu\left( {x,{{\mathbb{e}}^{{- 2}\;{Ct}}{C^{\prime}/R}}} \right)} \right)}{\mathbb{d}t}}}$plus the expression

${\frac{2\; C}{v}\left( {1 - t_{cut}} \right)u_{cut}{\Phi\left( z_{x}^{low} \right)}},$which can also be written as [δ_(c)D(δ_(c))]Φ(z_(x) ^(low)))/snr.

Change the variable of integration from t to u=

, to produce the simplified expression for the integral

$\frac{1}{v}{\int_{u_{cut}}^{1}{{\Phi\left( {\mu\left( {x,{{uC}^{\prime}/R}} \right)} \right)}\ {{\mathbb{d}u}.}}}$Add and subtract the value Φ(z_(x) ^(low)) in the integral to write itas [(1−u_(cut))/ν]Φ(z_(x) ^(low)), which is [1−δ_(c)/snr]Φ(z_(x)^(low)), plus the integral

$\frac{1}{v}{\int_{u_{cut}}^{1}{\left\lbrack {{\Phi\left( {\mu\left( {x,{{uC}^{\prime}/R}} \right)} \right)} - {\Phi\left( {\mu\left( {x,{u_{cut}{C^{\prime}/R}}} \right)} \right)}} \right\rbrack\ {{\mathbb{d}u}.}}}$

Now sinceΦ(b)−Φ(a)=∫1_({a<z<b})φ(z)dz,it follows that this integral equals)∫∫1_({u) _(cut) _(≦u≦1})

φ(z)dzdu/ν.

Switch the order of integration. In the integral, the inequality z≦μ(x,u

/R) is the same asu≦u _(x) R/

(1+z/τ) ²,which exceeds u_(cut) for z greater than z_(x) ^(low). Here u_(x)=1−xν.This determines an interval of values of u. For z between z_(x) ^(low)and z_(x) ^(max) the length of this interval of values of u is equal to1−(R/

)u _(x)(1+z/τ) ².Using u_(x)=1−xν one sees that this interval length, when divided by ν,may be written as

${x + {\frac{u_{x}}{v}\left( {1 - {\frac{R}{C^{\prime}}\left( {1 + {z/\tau}} \right)^{2}}} \right)}},$a quadratic function of z.

Integrate with respect to φ(z). The resulting value of g_(num)(x) may beexpressed as

${{\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack{\Phi\left( z_{x}^{low} \right)}} + {\frac{1}{v}{\int_{z_{x}^{low}}^{z_{x}^{\max}}{\left\lbrack {1 - {\left( {R/C^{\prime}} \right){u_{x}\left( {1 + {z/\tau}} \right)}^{2}}} \right\rbrack{\phi(z)}\ {\mathbb{d}z}}}}},$To evaluate, expand the square (1+z/τ)² in the integrand as1+2z/τ+z²/τ². Multiply by φ(z) and integrate. For the term linear in z,use zφ(z)=−φ′(z) for which its integral is a difference in values ofφ(z) at the two end points. Likewise, for the term involvingz²=1+(z²−1), use (z²−1)=−(zφ(z))′ which integrates to a difference invalues of zφ(z). Of course the constant multiples of φ(z) integrate to adifference in values of Φ(z). The result for the integral matches whatis stated in the Lemma. This completes the demonstration of Lemma 12.

One sees that the integral g_(num)(x) may also be expressed as

${{\frac{1}{snr}\left\lbrack {\delta_{c} + {D\left( \delta_{c} \right)}} \right\rbrack}{\Phi\left( z_{x} \right)}} + {\frac{1}{v}{\int{\left\lbrack {1 - {\max\left\{ {{u_{x}\frac{R}{C^{\prime}}\left( {1 + {z/\tau}} \right)_{+}^{2}},u_{cut}} \right\}}} \right\rbrack_{+}{\phi(z)}{{\mathbb{d}z}.}}}}$To reconcile this form with the integral given in the Lemma one notesthat the integrand here for z below z_(x) takes the form of a particularconstant value times φ(z) which, when integrated, provides acontribution that adds to the term involving φ(z_(x)).

Corollary 13.

Derivative evaluation. The derivative g′_(num)(x) is equal to

${\frac{\tau}{2}\left( {1 + \frac{z_{x}}{\tau}} \right)^{3}{\phi\left( z_{x} \right)}\frac{R}{C^{\prime}}{\log\left( {1 + \delta_{c}} \right)}} + {\int_{z_{x}}^{z_{x}^{\max}}{\frac{R}{C^{\prime}}\left( {1 + {z/\tau}} \right)^{2}{\phi(z)}\ {{\mathbb{d}z}.}}}$In particular if δ_(c)=0 the derivative g′(x) is

${\frac{R}{C^{\prime}}{\int_{z_{x}^{low}}^{z_{x}^{\max}}{\left( {1 + {z/\tau}} \right)^{2}{\phi(z)}{\mathbb{d}z}}}},$and then, if also R=

/(1+r/τ²) with r≧1, that is, if R≦

/(1+1/τ²), the difference g(x)−x is a decreasing function of x.

Demonstration:

Consider the last expression given for g_(num)(x). The part[(δ_(c)+D(δ_(c))]Φ(z_(x))/snr has derivativez _(x)′[δ_(c) +D(δ_(c))]φ(z _(x))/snr.Use (1+z_(x)/τ)=√{square root over (u_(cut)(C′/R)/(1−xν))}{square rootover (u_(cut)(C′/R)/(1−xν))} to evaluate z_(x)′ as

$z_{x}^{\prime} = {\frac{v}{2}\frac{1}{\left( {1 - {xv}} \right)^{3/2}}\sqrt{u_{cut}{C^{\prime}/R}}\tau}$and obtain that it is (ν/2)(1+z_(x)/τ)³τ/(u_(cut)

/R). So using u_(cut)=(1−ν)(1−δ_(c)) the z_(x)′ is equal to

$\frac{snr}{2}\left( {1 + {z_{x}/\tau}} \right)^{3}\frac{\tau}{\left( {1 + \delta_{c}} \right)}{\frac{R}{C^{\prime}}.}$This using the form of D(δ_(c)) and simplifying, the derivative of thispart of g_(num) is the first part of the expression stated in the Lemma.

As for the integral in the expression for g_(num), its integrand iscontinuous and piecewise differentiable in x, and the integral of itsderivative is the second part of the expression in the Lemma. Directevaluation confirms that it is the derivative of the integral.

In the δ_(c)=0 case, this derivative specializes to the indicatedexpression which is less than

${{\frac{R}{C^{\prime}}{\int_{- \infty}^{\infty}{\left( {1 + {z/\tau}} \right)^{2}{\phi(z)}\ {\mathbb{d}z}}}} = {\frac{R}{C^{\prime}}\left\lbrack {1 + {1/\tau^{2}}} \right\rbrack}},$which by the choice of R is less than 1. Then g(x)−x is decreasing as ithas a negative derivative. This completes the demonstration of Corollary13.

Corollary 14.

A lower bound. The g_(num)(x) is at least g_(low)(x) given by

${\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack{\Phi\left( z_{x} \right)}} + {\frac{1}{v}{\int_{z_{x}^{low}}^{\infty}{\left\lbrack {1 - {\left( {R/C^{\prime}} \right){u_{x}\left( {1 + {z/\tau}} \right)}^{2}}} \right\rbrack{\phi(z)}\ {{\mathbb{d}z}.}}}}$It has the analogous integral characterization as given immediatelypreceding Corollary 13, but with removal of the miter positive partrestriction. Moreover, the function g_(low)(x)−x may be expressed as

$\mspace{20mu}{{{g_{low}(x)} - x} = {\left( {1 - {xv}} \right)\frac{R}{{vC}^{\prime}}\frac{A\left( z_{x} \right)}{\tau^{2}}}}$  where$\frac{A(z)}{\tau^{2}} = {{\frac{C^{\prime}}{R} - \left( {1 + {1/\tau^{2}}} \right) - \frac{{2{{\tau\phi}(z)}} + {z\;{\phi(z)}}}{\tau^{2}} + {\left\lbrack {1 + {1/\tau^{2}} - {\left( {1 - \Delta_{c}} \right)\left( {1 + {z/\tau}} \right)^{2}}} \right\rbrack{{\Phi(z)}.\mspace{20mu}{with}}\mspace{14mu}\Delta_{c}}} = {{\log\left( {1 + \delta_{c}} \right)}.}}$

Optionally, the expression for g_(low)(x)−x may be written entirely interms of z=z_(x) by noting that

${\left( {1 - {xv}} \right)\frac{R}{{vC}^{\prime}}} = {\frac{\left( {1 + \delta_{c}} \right)}{{{snr}\left( {1 + {z/\tau}} \right)}^{2}}.}$

Demonstration: The integral expressions for g_(low)(x) are the same asfor g_(num)(x) except that the upper end point of the integrationextends beyond z_(x) ^(max), where the integrand is negative, i.e., theouter restriction to the positive part is removed. The lower boundconclusion follows from this negativity of the integrand above z_(x)^(max). Evaluate g_(low)(x) as in the proof of Lemma 12, using for theupper end that Φ(z) tends to 1, while φ(z) and zφ(z) tend to 0 as z→∞,to obtain that g_(low)(x) is equal to

${\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack{\Phi\left( z_{x} \right)}} + {\left\lbrack {x + {\delta_{R}\frac{u_{x}}{v}}} \right\rbrack\left\lbrack {1 - {\Phi\left( z_{x} \right)}} \right\rbrack} - {2\frac{R}{C^{\prime}}\frac{u_{x}}{v}\frac{\phi\left( z_{x} \right)}{\tau}} - {\frac{R}{C^{\prime}}\frac{u_{x}}{v}{\frac{z_{x}{\phi\left( z_{x} \right)}}{\tau^{2}}.}}$Replace the x+δ_(R)u_(x)/ν with the equivalent expression(1/ν)[1−u_(x)(R/

)(1+1/τ²)]. Group together the terms that are multiplied by u_(x)(R/

)/ν to be part of A/τ². Among what is left is 1/ν. Adding andsubtracting x, this 1/ν is x+u_(x)/ν which is x+[(u_(x)/ν)R/

][

/R]. This provides the x term and contributes the

/R term to A/τ².

It then remains to handle [1+D(δ_(c))/snr]Φ(z_(x))−(1/ν)Φ(z_(x)) whichis −(1/snr)[1−D(δ_(c))]Φ(z_(x)). Multiplying and dividing it by

$\frac{v\; C^{\prime}}{u_{x}R} = \frac{{{snr}\left( {1 + {z/\tau}} \right)}^{2}}{1 + \delta_{c}}$and then noting that (1−D(δ_(c)))/(1+δ_(c)) equals 1−Δ_(c), it providesthe associated term of A/τ². This completes the demonstration ofCorollary 14.

What is gained with this lower bound is simplification because theresult depends only on z_(x)=z_(X) ^(low) and not also on z_(x) ^(max).

9.3 Values of g(x) Near 1:

From the expression for x in terms of z, when R is near

, the point z=0 corresponds to a value of x near 1−δ_(c)/snr. Thisrelationship is used to establish reference values of x* and z* and tobound how close g(x*) is to 1.

A convenient choice of x* satisfies (1−x*ν)R=(1−ν)

. More flexible is to allow other values of x* by choosing it along witha value r₁ to satisfy the condition(1−x*ν)R=(1−ν)

/(1+r ₁/τ²).Also call the solution x=x_(up). When r₁ is positive the x* isincreased. Negative r₁ is allowed as long as r₁>−τ², but keep r₁ smallcompared to τ so that x* remains near 1.

With the rate R taken to be not more than

, write it as

$R = {\frac{C^{\prime}}{\left( {1 + {r/\tau^{2}}} \right)}.}$

Lemma 15.

A value of x* near 1. Let R′=

/(1+r₁/τ²). For any rate R between R′/(1+snr) and R′, the x* as definedabove is between 0 and 1 and satisfies

${1 - x^{*}} = {\frac{R^{\prime} - R}{R\;{snr}} = {\frac{r - r_{1}}{{snr}\left( {\tau^{2} + r_{1}} \right)} = {\left( {1 - {x^{*}v}} \right){\frac{r - r_{1}}{v\left( {\tau^{2} + r} \right)}.}}}}$It is near 1 it R is near R′. The value of z_(x) at x*, denoted z*=ζsatisfies(1+ζ/τ)²=(1+δ_(c))(1+r ₁/τ²).

This relationship has δ_(c) near 2ζ/τ, when and ζ and r₁ are small incomparison to τ. The δ_(c)τ and r₁ are arranged, usually both positive,and of the order Of a power of a logarithm of τ, just large enough thatΦ(ζ)=1−Φ(ζ) contributes to a small shortfall, yet not so large that itoverly impacts the rate.

Demonstration of Lemma 15:

The expression 1−xν may also be written (1−ν)+(1−x)ν. So the abovecondition may be written 1+(1−x*)snr−R′/R which yields the first twoequalities. It also may be written (1−x*ν)=(1−ν)(1+r/τ²)/(1+r₁/τ²) whichyields the third equality in that same line.

Next recall that z_(x)=μ(x,u_(cut)

/R) which is

Recalling that u_(cut)=(1−ν)(1+δ_(c)), at x* it is z*=ζ given byζ=(√{square root over ((1+δ_(c))(1+r ₁/τ²))}{square root over((1+δ_(c))(1+r ₁/τ²))}−1)τ,or, rearranging, express δ_(c) in terms of z*=ζ and r₁ via1+δ_(c)=(1+ζ/τ)²/(1+r ₁/τ²),which is the last claim. This completes the demonstration of Lemma 15.

Because of this relationship one may just as well arrange u_(cut) in thefirst place via ζ asu _(cut)=(1−ν)(1+ζ/τ)²/(1−r ₁/τ²)where suitable choices for ζ and r₁ will be found in an upcomingsection. Also keep the δ_(c) formulation as it is handy in expressingthe affect on normalization via D(δ_(c)).

Come now to the evaluation of g_(L)(x*) and its lower bound viag_(low)(x*). Since g_(L)(x) depends on x and R only via the expression(1−xν)R/

, the choice of x* such that this expression is fixed at (1−ν) impliesthat the value g_(L)(x*) is invariant to R, depending only on theremaining parameters snr, δ_(c)m r₁, τ and L. Naturally then, the sameis true of the lower bound via g_(low)(x*) which depends only on snr,δ_(c), r₁ and τ.

Lemma 16.

The value g(x*) is near 1. For the variable power case with 0≦δ_(c)<snr,the shortfall expressed by 1−g(x*) is less than

${\delta^{*} = \frac{{2\tau\;{\phi(\zeta)}} + {\zeta\;{\phi(\zeta)}} + {rem}}{{{snr}\left( {\tau^{2} + r_{1}} \right)}\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack}},$independent of the rate R≦R′, where the remainder is given byrem=[(τ² +r ₁)D(δ_(c))−(r ₁−1)] Φ(ζ).Moreover, g_(L)(x*) has shortfall δ_(L)*=1−g_(L)(x*) not more than δ*+2

/(Lν). In the constant power case, corresponding to δ_(c)=snr, theshortfall isδ*=1−Φ(ζ)= Φ(ζ).

Setting

$\zeta = \sqrt{2\;\log\frac{\tau}{d\sqrt{2\pi}}}$with a constant d and τ>d√{square root over (2π)}, with δ_(c) small,this δ* is near 2d/(snr τ²), whereas, with δ_(c)=snr, using Φ(ζ)≦φ(ζ)/ζ,it is not more than d/(ζτ).

Demonstration of Lemma 16

Using the lower bound on g(x*), the shortfall has the lower bound

$\delta^{*} = {1 - \frac{g_{low}\left( x^{*} \right)}{1 + {{D\left( \delta_{c} \right)}/{snr}}}}$which equals

$\frac{1 - {g_{low}\left( x^{*} \right)} + {{D\left( \delta_{c} \right)}/{snr}}}{1 + {{D\left( \delta_{c} \right)}/{snr}}}.$Use the formula for g_(low)(x*) in the proof of Lemma 14. For thisevaluation note that at x=x* the expression a u_(x)R/(ν

) simplifies to 1/[snr(1+r₁/τ²)] and the expression x+δ_(R)u_(x)/νbecomes

$1 + {\frac{r_{1} - 1}{{snr}\left( {\tau^{2} + r_{1}} \right)}.}$Consequently, g_(low)(x*) equals

$1 + {\frac{D\left( \delta_{c} \right)}{snr}{\Phi(\zeta)}} - {\frac{{2\tau\;{\phi(\zeta)}} + {{\zeta\phi}(\zeta)} - {\left( {r_{1} - 1} \right){\overset{\_}{\Phi}(\zeta)}}}{{snr}\left( {\tau^{2} + r_{1}} \right)}.}$This yields an expression for 1−g_(low)(x*)+D(δ_(c))/snr equal to

${{- \frac{D\left( \delta_{c} \right)}{snr}}{\overset{\_}{\Phi}(\zeta)}} + {\frac{{\left( {{2\tau} + \zeta} \right){\phi(\zeta)}} - {\left( {r_{1} - 1} \right){\overset{\_}{\Phi}(\zeta)}}}{{snr}\left( {\tau^{2} + r_{1}} \right)}.}$Group the terms involving Φ(ζ) to recognize this equals[(2τ+ζ)φ(ζ)+rem]/[snr(τ²+r₁)]. Then dividing by the expression1+D(δ_(c))/snr produces the claimed bound.

As for evaluation at the choice ζ=√{square root over (2 log τ/d√{squareroot over (2π)})}, this is the positive value for which φ(ζ)=d/τ, whenτ≧d√{square root over (2π)}. It provides the main contribution with2τφ(ζ)=2d. The ζφ(ζ) is then ζd/τ which is of order √{square root over(log τ)}/τ, small compared to the main contribution 2d.

For the remainder rein, using Φ(ζ)≦φ(ζ)/ζ and D(δ_(c))≦(δ_(c))²/2 near2ζ²/τ², the τ²D(δ_(c)) Φ(ζ) (is near 2ζφ(ζ)=2ζb₀/τ, again of order√{square root over (log τ)}/τ.

For δ_(L)*=1−g_(L)(x*), using g_(L)(x)≧g(x)+2

/(Lν) yields δ_(L)*≦δ*+2

/(Lν).

For the constant power case use g_(L)(x*)=g(x*)=Φ(ζ) directly, ratherthan g_(low)(x*). It has δ*= Φ(ζ), which is not more than φ(ζ)/ζ. Thiscompletes the demonstration of Lemma 16.

Corollary 17.

Mistake bound. The likely bound on the weighted fraction of faileddetections and false alarms δ_(L)*+η+ f, corresponds to an unweightedfraction of not more thanδ_(mis) =fac(δ_(L) *+η+ f )where the factorfac=snr(1+δ_(sum) ²)/[2

(1+δ_(c))].In the variable power case the contribution δ_(mis,L)*=facδ_(L)* is notmore than δ_(mis)*+(1/L)(1+snr)/(1+δ_(c)) with

${\delta_{mis}^{*} = \frac{{\left( {{2\tau}\; + \zeta} \right)\;{\phi(\zeta)}} + {rem}}{2{C\left( {\tau + \zeta} \right)}^{2}}},$while, in the constant power case δ_(c)=snr, the fac=1 and δ_(mis,L)*equalsδ_(mis)*= Φ(ζ)

Closely related to δ_(mis)* in the variable power case is the simplifiedformδ_(mis,simp)*=[(2τ+ζ)φ(ζ)+rem]/2

τ²,for which δ_(mis)*=δ_(mis,simp)*/(1+ζ/τ)².

Demonstration of Corollary 17:

Multiplying the weighted fraction by the factor 1/[L min₁π_((l))], whichequals the given fac, provides the upper bound on the (unweighted)fraction of mistakes δ_(mis)=fac(δ_(L)*+η+ f). Now δ_(L)*=1−g_(L)(x) hasthe upper bound

$\frac{1 - {g_{low}\left( x^{*} \right)} + \delta_{sum}^{2}}{1 + \delta_{sum}^{2}}.$Multiplying by fac yields δ_(mis,L)*=facδ_(L)* not more than

$\frac{1 - {g_{low}\left( x^{*} \right)} + \delta_{sum}^{2}}{\left( {2{C/{snr}}} \right)\left( {1 + \delta_{c}} \right)}.$Recall that δ_(sum) ² exceeds D(δ_(c))/snr by not more than 2

/(Lν) and that 1−g_(low)(x*)+D(δ_(c))/snr is less than[(2τ+ζ)φ(ζ)+rem]/[snr(τ²+r₁)]. So this yields the δ_(mis,L)* bound

$\frac{{\left( {{2\tau} + \zeta} \right){\phi(\zeta)}} + {rem}}{2{C\left( {\tau^{2} + r_{1}} \right)}\left( {1 + \delta_{c}} \right)} + {\frac{\left( {1 + {snr}} \right)}{\left( {1 + \delta_{c}} \right)}{\frac{1}{L}.}}$Recognizing that the denominator product (τ²+r₁)(1+δ_(c)) simplifies to(τ+ζ)² establishes the claimed form of δ_(mis)*.

For the constant power case note that fac=1 so that δ_(mis,L)*=δ_(mis)*is then unchanged from δ*= Φ(ζ). This completes the demonstration ofCorollary 17.

9.4 Showing g(x) is Greater than x:

This section shows that g_(L)(x) is accumulative, that is, it is atleast x for the interval from 0 to x*, under certain conditions on r.

Start by noting the size of the gap at x=x*.

Lemma 18.

The gap at x*. With rate R=

/(1+r/τ²), the difference g(x*)−x* is at least

$\frac{r - r_{up}}{{snr}\left( {\tau^{2} + r_{1}} \right)} = {\left( {1 - {x^{*}v}} \right){\frac{r - r_{up}}{v\left( {\tau^{2} + r} \right)}.}}$Here, 0≦δ_(c)<snr, with rem as given in Lemma 16.

$r_{up} = {r_{1} + \frac{{\left( {{2\tau} + \zeta} \right){\phi(\zeta)}} + {rem}}{1 + {{D\left( \delta_{c} \right)}/{snr}}}}$while, for δ_(c)=snr,r _(up) =r ₁ +snr(τ² +r ₁) Φ(ζ),which satisfies

$\frac{r_{up}}{\tau^{2}} = {\frac{\left( {1 + {{snr}\;{\overset{\_}{\Phi}(\zeta)}}} \right)\left( {1 + {\zeta/\tau}} \right)^{2}}{1 + {snr}} - 1.}$

Keep in mind that the rate target

depends on δ_(c). For small δ_(c) is near the capacity

, whereas for δ_(c)=snr it is near R₀>0.

If the upcoming gap properties permit, it is desirable to set r nearr_(up). Then the factor in the denominator of the rate becomes near1+r_(up)/τ². In some cases r_(up) is negative, permitting 1+r/τ² notmore than 1.

It is reminded that r₁, ζ, and δ_(c) are related by

${1 + {r_{1}/\tau^{2}}} = {\frac{\left( {1 + {\zeta/\tau}} \right)^{2}}{1 + \delta_{c}}.}$

Demonstration of Lemma 18:

The gap at x* equals g(x*)−x*. This value is the difference of 1−x* andδ*=1−g(x*), for which the bounds of the previous two lemmas hold.Recalling that 1−x* equals (r−r₁)/[snr(τ²+r₁)], adjust the subtractionof r₁ to include in r_(up) what is needed to account for δ* to obtainthe indicated expressions for g(x*)−x* and r_(up). Alternativeexpressions arise by using the relationship that r₁ has to the otherparameters. This complete the demonstration of Lemma 18.

Positivity of this gap at x* entails r>r_(up), and positivity of x*requires snr(τ²+r₁)+r₁≧r. There is an interval of such r providedsnr(τ²−r₁)>r_(up)−r₁.

For this next two corollaries, take the case that either δ_(c)=snr orδ_(c)=0, that is, either the power allocation is constant (completelylevel), or the power P_((l)) is proportional to

${u_{\ell} = {\exp\left\{ {{- 2}C\frac{\ell - 1}{L}} \right\}}},$unmodified (no leveling). The idea in both cases is to look for whetherthe minimum of the gap occurs at x* under stated conditions.

Corollary 19.

Positivity of g(x)−x with constant power. Suppose R=

/(1+r/τ²) where, with constant power, the

equals R₀(1−h′)/(1+δ_(a))², and suppose ντ≧2(1+r/τ²)√{square root over(2π)}. Suppose r−r_(up) is positive with r_(up) as given in Lemma 18,specific to this δ_(c)=snr case. If r≧0 and if r−r_(up) is less thanν(τ+ζ)²/2, then, for 0≦x≦x*, the difference g(x)−x is at least

${{gap} = {\frac{r - r_{up}}{{snr}\left( {\tau^{2} + r_{1}} \right)} = \frac{r - r_{up}}{{v\left( {\tau + \zeta} \right)}^{2}}}},$Whereas if r_(up)<r≦0 and if also

${{r/\tau} \geq {- \sqrt{2\;{\log\left( {v\;{{\tau\left( {1 + {r/\tau^{2}}} \right)}/2}\sqrt{2\pi}} \right)}}}},$then the gap g(x)−x on [0, x*] is at least

$\min{\left\{ {{{1/2} + {r/\left( {\tau\sqrt{2\pi}} \right)}},\frac{r - r_{up}}{{v\left( {\tau + \zeta} \right)}^{2}}} \right\}.}$In the latter case the minimum occurs at the second expression whenr<r _(up)+ν(τ+ζ)²[½+r/τ√{square root over (2π)}].

This corollary is proven in the appendix, where, under the statedconditions, it is shown that g(x)−x is unimodal for x≧0, so the value issmallest at x=0 or x=x*.

From the formula for r_(up) in this constant power case, it is negative,near −ντ², when snr Φ(ζ) and ζ/τ are small. It is tempting to try to setr close to r_(up), similarly negative. As discussed in the appendix, theconditions prevent pushing r too negative and compromise choices areavailable. With ντ at least a little more than the constant 2√{squareroot over (2π)}e^(π/4), allow r with which the 1+r/τ² factor becomes atbest near 1−√{square root over (2π)}/2τ, indeed nice that it is not morethan 1, though not as ambitious as the unobtainable 1+r_(up)/τ² near1−ν.

Corollary 20.

Positivity of g(x)−x with no leveling. Suppose R=

/(1−r/τ²)], where, with δ_(c)=0, the

equals

(1h′)/(1+2

/νL)(1+δ_(a))² near capacity. Set r₁=0 and ζ=0 for which 1−x*=r/(snr τ²)and r_(up)=2τ/√{square root over (2π)}+½ and suppose in this case thatr>r_(up). Then, for 0≦x≦x* the difference g(x)−x is greater than orequal to

${gap} = {\frac{r - r_{up}}{{snr}\left( {\tau^{2} + r_{1}} \right)}.}$Moreover, g(x)−x is at least (1−xν)GAP where

${GAP} = {\frac{r - r_{up}}{v\left( {\tau^{2} + r} \right)}.}$

Demonstration of Corollary 20:

With δ_(c)=0 the choice ζ=0 corresponds to r₁=0. At this ζ, the mainpart of r_(up) equals 2τ/√{square root over (2π)} since φ(0)=1/√{squareroot over (2π)} and the remainder rem equals ½ since Φ(0)=½. Thisproduces the indicated value of r_(up). The monotonicity of g(x)−x inthe δ_(c)=0 case yields, for x≦x*, a value at least as large as at x*where it is bounded by Lemma 18. This yields the first claim.

Next use the representation of g(x)−x as (1−xν)A(z_(x))/[ν(τ²+r)], wherewith δ_(c)=0 the A(z) isA(z)=r−1−2τφ(z)−zφ(z)+[τ²+1−(τ+z)²]Φ(z).It has derivative which simplifies toA′(z)=−2(τ+z)Φ(z),which is negative for z>−τ which includes the interval [z₀, z₁].Accordingly A(z_(x)) is decreasing and its minimum for x in [0, x*]occurs at x*. Appealing to Lemma 18 completes the demonstration of Lemma20.An alert to the reader: The above result together with the reliabilitybounds provides a demonstration that g_(L)(x) is such that a rate

/(1−r/τ²) is achieved with a moderately small fraction of mistakes, withhigh reliability. Here r/τ² at least r_(up)/τ² is nearly equal to aconstant times 1/τ, which is near 1/√{square root over (π log B)}. Thisis what can be achieved in the comparatively straightforward fashion ofthe first half of the manuscript.

Nevertheless, it would be better to have a bound with r_(up) of smallerorder so that for large B the rate is closer to capacity. For thatreason, next take advantage of the modification to the power allocationin which it is slightly leveled using a small positive δ_(c). When

is not large, these modifications make an improved rate target

_(B) which is below capacity by an expression of order 1/log B andlikewise the fraction of mistakes target (corrected by the outer code)is improved to be an expressions of order 1/log B. This is certainly animprovement over 1/√{square root over (log B)}, and as already saidherein, the reliability and rate tradeoff is not far from optimal inthis regime. That is, if one wants rate substantially closer to capacityit would necessitate worse reliability. Moreover, for any fixed R<

it is certainly the case that with sufficient size B the rate R is lessthan

_(B), so that the existing results take effect.

Nevertheless, both of the 1/√{square root over (log B)} and 1/log Bexpressions are not impressively small.

This has compelled the authors to push in what follows in the rest ofthis manuscript to squeeze out as much as one can concerning theconstant factor or other lower order factors. Even factors of 2 or 4very much matter when one only has a log B. There are myriad aspects ofthe problem (via freedoms of the specified design) in which efforts aremade push down the constants, and it persists as an active effort of theauthors. Accordingly, it is anticipated that the inventors herein willprovide further refinements of the material herein in the months tocome.

Nevertheless, the comparatively simple results in the δ_(c)=0 case aboveand the bounds from δ_(c)>0 in what follows both constitute first proofsof performance achievement by practical schemes, that scale suitably inrate, reliability and complexity.

It is anticipated that some refinements will derive from the toolsprovided here by making slightly different specializations of the designparameters already introduced, to which the invention has associatedrights to determine the implications of these specializations.

It is the presence of a practical high-performance encoder and decoderas well as the general tools of performance evaluation and of ratecharacterizations, e.g. via the update function g_(L)(x), that are thefeatured aspects of the invention to this point in the manuscript, andnot the current value of the moderate constants that are develop in thepages to come.

The specific tools for refinement become detailed mathematical effortsthat go beyond what most readers would want to digest. Nevertheless,proceed forward with inclusion of these since it does lead to specificinstantiations of code parameter settings for which the drop fromcapacity is improved in its characteristics to be a specific multiple of1/log B.

Monotonicity or unimodality of g(x)−x or of g_(low)(x)−x is used in theabove gap characterizations for the δ_(c)=0 and δ_(c)=snr cases. It whatfollows, analogous shape properties are presented that include theintermediate case.

9.5 Showing g (x)>x in the Case of Some Leveling:

Now allow small leveling of the power allocation, via choice of a smallδ_(c)>0 and explore determination of lower bounds on the gap.

Use the inequality g(x)≧g_(low)(x)/(1+D(δ_(c))/snr) so that

${{g(x)} - x} \geq {\frac{{g_{low}(x)} - x - {x\;{{D\left( \delta_{c} \right)}/{snr}}}}{1 + {{D\left( \delta_{c} \right)}/{snr}}}.}$This gap lower bound is expressible in terms of z=z_(x) using theresults of Lemma 14 and the expression for x given immediatelythereafter. Indeed,

${{g_{low}(x)} - x} = {\frac{u_{x}R}{v\; C^{\prime}}\frac{A\left( z_{x} \right)}{\tau^{2}}}$where for R=

/(1+r/τ²) the function A(z) simplifies tor−1−2τφ(z)−zφ(z)+[τ²+1−(1−Δ_(c))(τ+z)²]Φ(z),where Δ_(c)=log(1+δ_(c)). The multiplier u_(x)R/(ν

) is also (1+δ_(c))/(snr(1+z/τ)²). From the expression for x in terms ofz write

$x = {1 - \frac{\delta_{c}}{snr} + {\frac{\left( {1 + \delta_{c}} \right)}{{{snr}\left( {1 + {z/\tau}} \right)}^{2}}{\left( {\left( {1 + {z/\tau}} \right)^{2} - 1 - {r/\tau^{2}}} \right).}}}$Accordingly,g _(low)(x)−x−xD(δ_(c))/snr=G(z _(x))where G(z) is the function

${G(z)} = {{\frac{1 + \delta_{c}}{\left( {\tau + z} \right)^{2}}\frac{\overset{\sim}{A}(z)}{snr}} - {\frac{D\left( \delta_{c} \right)}{snr}\left( {1 - {\delta_{c}/{snr}}} \right)}}$with${\overset{\sim}{A}(z)} = {{A(z)} + {\frac{\tau^{2}{D\left( \delta_{c} \right)}}{snr}{\left( {1 - \left( {1 + {z/\tau}} \right)^{2} + {r/\tau^{2}}} \right).}}}$In this way the gap lower bound is expressed through the function G(z)evaluated at z=z_(x). Regions for x in [0,1] whereg_(low)(x)−x−xD(δ_(c))/snr is decreasing or increasing, havecorresponding regions of decrease or increase of G(z) in [z₀, z₁]. Thefollowing lemma characterizes the shape of the lower bound on the gap.

Definition:

A continuous function G(z) is said to be unimodal in an interval ifthere is a value z_(max) such that G(z) is increasing for any values tothe left of z_(max) and decreasing for any values to the right ofz_(max). This includes the case of decreasing or increasing functionswith z_(max) at the left or right end point of the interval,respectively.

Likewise, with domain starting at z₀, a continuous function G(z) is saidto have at most one oscillation if there is a value z_(G)≧z₀ such thatG(z) is decreasing for any values of z between z₀ and z_(G), andunimodal to the right of z_(G). Call the point z_(G) the critical valueof G.

Functions with at most one oscillation in an interval [z₀, z*] have theuseful conclusion that the minimum over the interval is determined bythe minimum of the values at z_(G) and z*.

Lemma 21.

Shape properties of the gap. Suppose the rate satisfiesR≦C′(1+D(δ_(c))/snr)/(1+1/τ²).The function g_(low)(x)−x−xD(δ_(c))/snr has at most one oscillation in[0, 1]. Likewise, the functions A(z) and G(z) have at most oneoscillation for z≧−τ and their critical values are denoted z_(A) andz_(G). For all Δ_(c)≧0, these satisfy z_(A)≦z_(G) and z_(A)≦−τ/2+1,which is less than or equal to 0 if τ≧2.

Moreover, if either Δ_(c)≦⅔ or Δ_(c)≧2√{square root over (2π)}half/τ,then z_(G) is also less than or equal to 0. Here half is an expressionnot much more than ½ as given in the proof.

The proof of Lemma 21 is in the appendix.

Note that τ≧3√{square root over (2π)} half is sufficient to ensure thatone or the other of the two conditions on A must hold. That would entaila value of B more than e^(2.25π)>1174. Such size of B is reasonable,though not essential as one may choose directly to have a small value ofΔ_(c) not more than ⅔.

One can pin down the location of z_(c); further, under additionalconditions on Δ_(c). However, precise knowledge of the value of z_(G) isnot essential because the shape properties allow us to take advantage oftight lower bounds on A(z) for negative z as discussed in the nextlemma.

It holds that z_(A)≦τ/2+1 and under conditions on Δ_(c) that z_(A)≦−τ/2.For −τ/2+1 to be negative, it is assumed that τ≧2, as is the case whenB≧e². Preferably B is much larger.

Lemma 22

Lower bounding A(z) for negative z: In an initial interval [−τ,t] witht=−τ/2 or −τ/2+1, the function A(z) is lower bounded byA(z)≧r−1−ε,where ε is (2τ+t)/t²)φ(t). In particular for t=−τ/2 it is (6/τ)φ(τ/2),not more than (3/√{square root over (π log B)})(1/B)^(0.5), polynomiallysmall in 1/B. Likewise, if t=−τ/2+1, the ε remains polynomially

If Δ_(c)≧4/√{square root over (2π)}−1/τ, then the above inequality holdsfor all negative z.

${\min\limits_{{- \tau} \leq z \leq 0}{A(z)}} \geq {r - 1 - \varepsilon}$with   ε = (6/τ)ϕ(τ/2).

Finally if also Δ_(c)≧8/τ² then for z between −τ/2 and 0, theΔ(z)>r−1which is strictly greater than r−1 with no need for ε.

Demonstration of Lemma 22:

First, examine A(z) for z in an initial interval of the form [−τ,t]. Forsuch negative z one has that A(z) is at least r−1−2τφ(z) which is atleast r−1−2τφ(t). This is seen by observing that in the expression forA(z), the −zφ(z) term and the term involving Φ(z) are positive for z≦0.So for ε one can use 2τφ(t).

Further analysis of A(z) permits the improved value of e as stated inthe lemma. Indeed, A(z) may be expressed asA ₀(z)=r−1−(2τ+z)φ(z)−(2τ+z)zΦ(z)+Φ(z)plus an additional amount [Δ_(c)(τ+z)²]Φ(z) which is positive. It'sderivative simplifies as in the analysis in the previous lemma and it isless than or equal to 0 for −τ≦z≦0, so A₀(z) is a decreasing function ofz, so its minimum in [−τ,t] occurs at z=t.

Recall that along with the upper bound |z|Φ(z)≦φ(z), there is the lowerbound of Feller, |z|Φ(z)>[1−1/z²]φ(z), or the improvement in theappendix which yields |z|Φ(z)≧[1−1/(z²+1)]φ(z), which isΦ(z)≧(|z|/(z²+1)φ(z), for negative z. Accordingly obtainA ₀(z)≧r−1−[(2τ+z)/(z ²+1)]φ(z).At z=t=−τ/2 the amount by which it is less than r−1 is[(3/2)τ/(τ²/4+1)]φ(τ/2) not more than (6/τ)φ(τ/2), which is not morethan (6/√{square root over (2π2 log B)})(1/B)^(1/2). An analogous boundholds at t=−τ/2+1.

Next consider the value of A(z) at z=0. Recall that A(z) equalsr−1−(2τ+z)φ(z)+[τ²+1−(1−Δ_(c))(τ+z)²]Φ(z).At z=0 it isr−1−2τ/√{square root over (2π)}+[1+Δ_(c)τ²]/2which is at least r−1 if Δ_(C)τ²≧4τ/√{square root over (2π)}−1, that is,if Δ_(c)≧4/(τ√{square root over (2π)})−1/τ². This is seen to be greaterthan Δ_(c)**=2/(τ²/4+2), having assumed that τ at least 2. So by theprevious lemma A(z) is unimodal to the right of t=τ/2, and it followsthat the bound r−1−ε holds for all z in [−τ, 0].

Finally, for A(z) in the formr−1−(2τ+z)φ(z)−(2τ+z)zΦ(z)+[1+Δ_(c)(τ+z)²]Φ(z),replace −zΦ(z), which is |z|Φ(z) with its lower bound φ(z)−(1/|z|)Φ(z)for negative z from the same inequality in the appendix. Then the termsinvolving φ(z) cancel and the lower bound on A(z) becomesr−1+[1+Δ_(c)(τ+z)²−(2τ+z)/|z|]Φ(z)which isr−1+[Δ_(c)(τ+z)²+2(τ+z)/z]Φ(z).In particular at z=−τ/2 it is r−1+[Δ_(c)τ²/4−2]Φ(−τ/2) which exceeds r−1by a positive amount due to die stated conditions on Δ_(c). To determinedie region in which the expression in brackets is positive moreprecisely, proceed as follows. Factoring out τ+z the expressionremaining in the brackets isΔ_(c)(τ+z)+2/z.

It starts out negative just to the right of −τ and it hits 0 for zsolving the quadratic Δ_(c)(τ+z)z+2=0, for which the left and rightroots are z=[−τ±√{square root over (τ²−8/Δ_(c))}]/2, again centered at−τ/2. The left root is near −τ[1−2/(Δ_(c)τ)]. So at least between theseroots, and in particular between the left root and the point −τ/2, theA(z)≧r−1. The existence of these roots is implied by Δ_(c)>8/τ² which inturn is greater than Δ_(c)**=8/(τ²+8). So by the analysis of theprevious Lemma, A′(z) is positive at −τ/2 and A(z) is unimodal to theright of −τ/2. Consequently A(z) remains at least r−1 for all z betweenthe left root and 0. This completes the demonstration of Lemma 22.

Exact evaluation of G(z_(crit)) is problematic, so instead takeadvantage for negative z of the tight lower bounds on G(z) that followimmediately from the above lower bounds on A(z). With no conditions onΔ_(c), use A(z)≧r−1−ε for z≦−τ/2+1 and unimodality of A(z) to the rightof there, to allow us to combine this with the bounds at z*. This use ofunimodality of A(z) has the slight disadvantage of needing to replaceu_(x)=1−xν with the lower bound 1−x*ν, and needing to replace−xD(δ_(c))/snr with −x*D(δ_(c))/snr, to obtain the combined lower boundon G(z) via A(z). In contrast, with conditions on Δ_(c), use directlythat the minimum of G(z) occurs at the minimum of the values at anegative z_(G) and at z*, allowing slight improvement on the gap.

Lemma 23.

Lower bounding G(z) for negative z: If Δ_(c)τ≧4/√{square root over(2π)}−1/2τ, then for −τ<z≦0, setting z′=z(1+z/2τ), the function G(z) isat least

${{\left( {1 + \delta_{c}} \right)\frac{r - 1 - \varepsilon - {\left( {{2z^{\prime}\tau} - r} \right){{D\left( \delta_{c} \right)}/{snr}}}}{\left( {\tau + z} \right)^{2}{snr}}} - {\frac{D\left( \delta_{c} \right)}{snr}\left( {1 - \frac{\delta_{c}}{snr}} \right)}},$which foe r≧(1+ε)/(1+D(δ_(c))/snr), yields G(z) at least

$\frac{r - 1 - \varepsilon + {r\;{D\left( \delta_{c} \right)}{snr}}}{\tau^{2}{snr}} - {\frac{D\left( \delta_{c} \right)}{snr}{\left( {1 - \frac{\delta_{c}}{snr}} \right).}}$Consequently, the gap g(x)−x for z_(x)≦0 is at least

$\frac{r - r_{down}}{{snr}\;{\tau^{2}/\left( {1 + \delta_{c}} \right)}}$with${r_{down} = \frac{1 + \varepsilon + {\tau^{2}{D\left( \delta_{c} \right)}{\left( {1 - {\delta_{c}/{snr}}} \right)/\left( {1 + \delta_{c}} \right)}}}{1 + {{D\left( \delta_{c} \right)}/{snr}}}},$less than 1+ε+τ²D(δ_(c)). If also Δ_(c)≧8/τ², then the aboveinequalities hold for −τ/2≦z≦0 without the ε.

Demonstration of Lemma 23

Using the relationship between G(z) and A(z) given prior to Lemma 21,these conclusions follow immediately from plugging in the bounds on A(z)from Lemma 22.

Next combine the gap bounds for negative z with the gap bound for z*.This allows to show that g(x)−x has a positive gap as long as the ratedrop from capacity is such that r>τ_(crit) for a value of r_(crit) asidentified. This holds for a range of choices of r₁ including 0.

Lemma 24.

The minimum value of the gap. For 0≦x≦x*, of r>r_(crit), then the g(x)−xis at least

$\frac{r - r_{crit}}{{snr}\left( {\tau^{2} + r_{1}} \right)}.$This holds for an r_(crit) not more than r_(crit)* given bymax{(τ² +r ₁)D(δ_(c))+1+ε,r ₁+(2τ+ζ)φ(ζ)+rem},where, as before, rem=[(τ²+r₁)D(δ_(c))+1−r₁ ]Φ(ζ) and ε is as given inLemma

with t=−τ/2+1. Then g_(L)(x)−x on [0, x*] has gap at least

${gap} = {\frac{r - r_{crit}}{{snr}\left( {\tau^{2} + r_{1}} \right)} - {\frac{2C}{vL}.}}$Consequently, any specified positive value of gap is achieved by settingr=r _(crit) +snr(τ² +r ₁)[gap+2

/(Lν)].The contribution to the denominator of the rate expression(1+D(δ_(c))/snr)(1+r_(crit)/τ²) at r_(crit) has the representation interms of r_(crit)* as1+(1+r ₁/τ²)D(δ_(c))/snr+r _(crit)*/τ².If Δ_(c)≧4/(τ√{square root over (2π)})−1/τ² and either Δ_(c)≦2/3 orΔ_(c)≧√{square root over (2π)}half/τ, then in the above characterizationof r_(crit)* the D(δ_(c)) in the first expression of the max may bereduced to D(δ_(c))(1−δ_(c)/snr).

Moreover, there is the refinement that g(x)−x is at least

${\frac{1}{snr}\min\left\{ {\frac{r - r_{down}}{\tau^{2}/\left( {1 + \delta_{c}} \right)},\frac{r - r_{up}}{\tau^{2} + r_{1}}} \right\}},$where r_(down) and r_(up) are as given in Lemmas

and 18, respectively. If also δ_(x) is such that the z_(G) Of order−√{square root over (2 log(τ/δ_(c)))} is between −τ/2 and 0, then the εabove may be omitted.

For given ζ>0, adjust r₁ to optimize the value of r_(crit)* in the nextsubsection.

The proof of the lemma will improve on the statement of the lemma byexhibiting an improved value of r_(crit) that makes use of r_(crit)*.

Demonstration of Lemma 24:

Replacing g(x) by its lower bound g_(low)(x)/[1+D(δ_(c))/snr] the g(x)−xis at least

${{gap}_{low}(x)} = \frac{{g_{low}(x)} - {x\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack}}{1 + {{D\left( \delta_{c} \right)}/{snr}}}$which is

$\frac{{\left( {u_{x}/v} \right)\left( {R/C^{\prime}} \right){{A\left( z_{x} \right)}/\tau^{2}}} - {x\;{{D\left( \delta_{c} \right)}/{snr}}}}{1 + {{D\left( \delta_{c} \right)}/{snr}}}.$For 0≦x≦x* the u_(x)R/(ν

) is at least its value at x* which is 1/[snr(1+r₁/τ²)], so gap_(low)(x)is at least

$\frac{{{A\left( z_{x} \right)}/\left\lbrack {{snr}\left( {\tau^{2} + r_{1}} \right)} \right\rbrack} - {x^{*}{{D\left( \delta_{c} \right)}/{snr}}}}{1 + {{D\left( \delta_{c} \right)}/{snr}}},$which may also be written

$\frac{{A\left( z_{x} \right)} - {\left( {\tau^{2} + r_{1}} \right)x^{*}{D\left( \delta_{c} \right)}}}{{{snr}\left( {\tau^{2} + r_{1}} \right)}\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack},$which by Lemma 18 coincides with (r−r_(up))/[snr(τ²+r₁)] at x=x*.

Now recall from Lemma 21 that A(z) is unimodal for z≧t, where t is −τ/2or −τ/2+1, depending on the value of Δ_(c). As seen, when Δ_(c) issmall, the A(z) is in fact decreasing and so one may use r_(crit)=r_(up)from the gap at x*. For other Δ_(c), the unimodality of A(z) for z≧timplies that the minimum of A(z) over [−τ,z*] is equal to that of over[−τ,t]∪{z*}. As seen in Lemma 22, the minimum of A(z) in [−τ,t] is givenby A_(low)=r−1−ε. Consequently, the g(x)−x on 0≦x≦x* is at least

$\min{\left\{ {\frac{r - 1 - \varepsilon - {\left( {\tau^{2} + r_{1}} \right)x^{*}{D\left( \delta_{c} \right)}}}{{{snr}\left( {\tau^{2} + r_{1}} \right)}\left( {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right)},\frac{r - r_{up}}{{snr}\left( {\tau^{2} + r_{1}} \right)}} \right\}.}$Now x*=1−(r−r₁)/[snr(τ²+r₁)]. So (τ²+r₁)x* is equal to(τ²+r₁)−(r−r₁)/snr. Then, gathering the terms involving r, note that afactor of 1+D(δ_(c))/snr arises that cancels the corresponding factorfrom the denominator for the part involving T. Extract the value rshared by the two terms in the minimum to obtain that the aboveexpression is at least

$\frac{r - r_{crit}}{{snr}\left( {\tau^{2} + r_{1}} \right)}$where here r_(crit) is given by

$\max{\left\{ {\frac{{\left( {\tau^{2} + r_{1}} \right){D\left( \delta_{c} \right)}} + 1 + \varepsilon + {r_{1}{{D\left( \delta_{c} \right)}/{snr}}}}{1 + {{D\left( \delta_{c} \right)}/{snr}}},r_{up}} \right\}.}$Arrange 1+D(δ_(c))/snr as a common denominator. From the definition ofr_(up) its numerator becomes r₁[1+D(δ_(c))/snr](2τ+ζ)φ(z)+rem. Itfollows that in the numerator the two expressions in the max share theterm r₁D(δ_(c))/snr. Accordingly, with α=[D(δ_(c))/snr]/[1+D(δ_(c))/snr]and 1−α=1/[1+D(δ_(c))/snr], it holds thatr _(crit) =αr ₁+(1−α)r _(crit)*,with r_(crit)* given bymax{(τ² +r ₁)D(δ_(c))+1+ε,r ₁+(2τ+ζ)φ(ζ)+rem}.This r_(crit)* exceeds r₁, because the amount added to r₁ in the secondexpression in the max is the same as the numerator of the shortfall δ*which is positive. Hence αr₁+(1−α)r_(crit)* is less than r_(crit)*. Sothe r_(crit) here improves somewhat on the choice in the statement ofthe Lemma.

Moreover, from

$r_{crit} = \frac{r_{crit}^{*} + {r_{1}{{D\left( \delta_{c} \right)}/{snr}}}}{1 + {{D\left( \delta_{c} \right)}/{snr}}}$it follows that (1+D(δ_(c))/snr)(1+r_(crit)/τ²) is equal to

${1 + \frac{\left( {1 + {r_{1}/\tau^{2}}} \right){D\left( \delta_{c} \right)}}{snr} + \frac{r_{crit}^{*}}{\tau^{2}}},$as claimed.

Finally, for the last conclusion of the Lemma, it follows from the factthat

${{\min\limits_{{- \tau} < z \leq z^{*}}{G(z)}} = {\min\left\{ {{G\left( z_{G} \right)},{G\left( z^{*} \right)}} \right\}}},$invoking z_(G)≦0 and combining the bounds form Lemmas 23 and 18. Thiscompletes the demonstration of Lemma 24.

Note from the form of rem and using 1−Φ(ζ)=Φ(ζ) that r_(crit)* may bewrittenmax{(τ² +r ₁)D(δ_(x))+ε,(r ₁−1)Φ(ζ)+(2τ+ζ)φ(z)+(τ2r ₁)D(δ_(c)) Φ(ζ)}.Thus [(τ²+r₁)D(δ_(c))+1] appears both in the first expression and as amultiplier of Φ(ζ) in the remainder of the second expression in the max.

To clean the upcoming expressions, note that upon replacing the secondexpression in this max with the bound in which the polynomially small eis added to it, then r_(crit)*−1−ε becomes independent of ε.Accordingly, henceforth herein make that redefinition of r_(crit)*.Denoting {tilde over (r)}_(crit)*=r_(crit)*−1−ε it becomesmax{(τ² −r ₁)D(δ_(c)),(r ₁−1)Φ(ζ)+(2τ+ζ)φ(z)+(τ² +r ₁)D(δ_(c)) Φ(ζ)}.

Evaluation of the best r_(crit)* arises in the next subsection fromdetermination of the r₁ that minimizes it.

9.6 Determination of δ_(c):

Here suitable choices of the leveling parameter δ_(c) are determined.Recall, δ_(c)=0 corresponds to no-leveling and δ_(c)=snr corresponds tothe constant power allocation, and both will have their role for verylarge and very small snr, respectively. Values in between are helpful inconjunction with controlling the rate drop parameter r_(crit).

Recall the relationship 1+δ_(c)=(1+ζ/τ)²/(1+r₁/τ²), used in analysis ofthe gap based on g_(low)(x), where ζ is the value of z_(x) at the upperend point x* of the interval in which the gap property is invoked. Inthis subsection, hold ζ fixed and ask for the determination of asuitable choice of δ_(c).

In view of the indicated relationship this is equivalent to thedetermination a choice of r₁. There are choices that arise in obtainingmanageable bounds on the rate drop. One is to set r₁=0 at which δ_(c) isnear 2ζ/τ, proceeding with a case analysis depending on which of the twoterms of r_(crit)* is largest. In the end this choice permits roughlythe right form of bounds, but noticeable improvements in the constantsarise with suitable non-zero r₁ in certain regimes.

Secondly, as determined in this section, one can find the r₁ orequivalently δ_(c)=δ_(match) at which the two expressions in thedefinition of r_(crit)* match. In some cases this provides the minimumvalue of r_(crit)*.

Thirdly, keep in mind that a small mistake rate δ_(mis)* is desired aswell as a small drop from capacity of the inner code. The use of theoverall rate of the composite code provides means to express acombination of δ_(mis)*, r_(crit)* and D(δ_(c))/snr to optimize.

In this subsection the optimization of δ_(c) for each ζ is addressed,and then in the next subsection the choice of nearly best values of ζ.In particular, this analysis provides means to determine regimes forwhich it is best overall to use δ_(match) or for which it is best to useinstead δ_(c)=0 or δ_(c)=snr.

For ζ>−τ, define ζ′ byζ′=ζ(1+ζ/2τ)for which (1+ζ/τ)²=1+2ζ′/τ and define ψ=ψ(ζ) byψ=(2τ+ζ)φ(ζ)/Φ(ζ)and γ=γ(ζ) by the small valueγ=2ζ′/τ+(ψ−1)/τ².

Lemma 25.

Match making. Given ζ, the choice of δ_(c)=δ_(match) that makes the twoexpressions in the definition of r_(crit)* be equal is given by1+δ_(c) =e ^(γ/(1+ζ/τ)) ² .at which1+r ₁/τ²=(1+ζ/τ)² e ^(−γ/(1+ζ/τ)) ².This δ_(c) is non-negative for ζ such that γ≧0. At this δ_(c)=δ_(match)the value of {tilde over (r)}^(crit)*=r_(crit)*−1−ε is equal toτ²)(1+r ₁/τ²)D=(δ_(c))=r ₁+ψ−1,which yields {tilde over (r)}_(crit)/τ² equal to(1+ζ/τ)² [e ^(−γ/(1+ζ/τ)) ² −1]+γ,which is less than γ²/[2(1+ζ/τ)²] for γ>0. Moreover, the contributionδ_(mis)* to the mistake rate as in Lemma 17, at this choice of δ_(c) andcorresponding r₁, is equal to

$\delta_{mis}^{*} = {\frac{\psi}{2\left( {\tau + \zeta} \right)^{2}C}.}$

Remark:

Note from the definition of ψ and ζ′ that

$\gamma = {\frac{{\left( {2 + {\zeta/\tau}} \right)\left( {\zeta + {{\phi(\zeta)}/{\Phi(\zeta)}}} \right)} - {1/\tau}}{\tau}.}$Thus γ is near 2(ζ+φ(ζ)/Φ(ζ))/τ.

Using the tail properties of the normal given in the appendix, theexpression ζ+φ(ζ)/Φ(ζ) is seen to be non-negative and increasing in ζfor all ζ on the line, near to 1/|ζ| for sufficiently negative ζ, and atleast ζ for all positive ζ, In particular, γ is found to be non-negativefor ζ at least slightly to the right of −τ.

Meanwhile, by such tail properties, φ(ζ)/Φ(ζ) is near |ζ| forsufficiently negative ζ, so to keep δ_(mis)* small it is desired toavoid such ζ unless the capacity

is very large. At ζ=0 the ψ equals 4π/√{square root over (2π)} so theδ_(mis)* there is of order 1/τ. When

is not large, a somewhat positive ζ is preferred to produce a smallδ_(mis)* in balance with the rate drop contribution r_(crit)*/τ².

The ζ=0 case is illustrative for the behavior of γ and relatedquantities. There γ is (ψ−1)/τ² equal to 4/τ√{square root over(2π)}−1/τ², with which δ_(c)=e^(γ)−1. Also r₁/τ²=e^(γ)−1. The {tildeover (r)}_(crit)*/τ²=r₁/τ²+(ψ−1)/τ² is then equal e^(−γ)−1+γ near to andupper bounded by γ²/2=(½)(ω−1)²/τ⁴ less than 4/τ²π. In this ζ=0 case,the slightly positive δ_(c), with associated negative r₁, is sufficientto cancel the (ψ−1)/τ² part of {tilde over (r)}_(crit)*/τ², leaving justthe small amount bounded by (½)(ψ−1)²/τ⁴. With (ψ−1)/τ² less than 2,that is a strictly superior value for {tilde over (r)}_(crit)*/τ² thanobtained with δ_(c)=0 and ζ=0 for which {tilde over (r)}_(crit)*/τ² is(ψ−1)/τ².

Demonstration of Lemma 25:

The r_(crit)*−ε is the maximum of the two expressions(τ² +r ₁)D(δ_(c))+1andr ₁Φ(ζ)+(2τ+ζ)φ(z)+[(τ² +r ₁)D(δ_(c))+1] Φ(ζ).Equating these, grouping like terms together using 1− Φ(ζ)=Φ(ζ) and thendividing through by Φ(ζ) yields(τ² +r ₁)D(δ_(c))−r ₁=ψ−1.Using τ²+r₁ equal to (τ+ζ)²/(1+δ_(c)) and [D(δ_(c))−1]/(1+δ_(c)) equalto log(1+δ_(c))−1 the above equation may be written(τ+ζ)²[log(1+δ_(c))−1]+τ²=ψ−1.Rearranging, it is

${{\log\left( {1 + \delta_{c}} \right)} = {1 + \frac{\tau^{2} + \left( {\psi - 1} \right)}{\left( {\tau + \zeta} \right)^{2}}}},$where the right side may also be written γ/(1+ζ/τ)². Exponentiatingestablishes the solution for 1+δ_(c), with corresponding r₁ asindicated. Let's call the value that produces this equalityδ_(c)=δ_(match). At this solution the value of {tilde over(r)}_(crit)*=r_(crit)*−1−ε satisfies(τ² +r ₁)D(ε_(c))=r ₁+ψ−1.Likewise, from the identity (τ²+r₁)D(δ_(c))−(r₁−1)=ψ, multiplying byΦ(ζ) this establishes that the remainder used in Lemma 16 is in thepresent case equal to rem=ψ Φ(ζ), while the main part (2τ+ζ)φ(ζ) isequal to ψΦ(ζ). Adding them using Φ(ζ)+ Φ(ζ)=1 shows that (2τ+ζ)φ(ζ)+remis equal ψ. This is the numerator in the mistake rate expressionδ_(mis)*.

Using the form of r₁ the above expression for {tilde over(r)}_(crit)*/τ² may also be written(1+ζ/τ)² [e ^(−γ/(1+ζ/τ)) ² −1]+γ.With y=γ/(1+ζ/τ)² positive, the expression in the brackets is e^(−y)−1which is less than y+y²/2. Plugging that in, the part linear in ycancels, leaving the claimed bound γ²/[2/(1+ζ/τ)²]. This completes thedemonstration of Lemma 25.

Lemma 26.

The optimum δ_(c) quartet. For each ζ, consider the followingminimizations. First, consider the minimization of r_(crit)* for δ_(c)in the interval [0, snr]. Its minimum occurs at the positive δ_(c) whichis the minimum of the three values δ_(thresh) ₀ , δ_(match), and snr,where δ_(thresh) ₀ =Φ(ζ)/ Φ(ζ).

Second, consider the minimization of(1+D(δ_(c))/snr)(1+r _(crit)/τ²)as arises in the denominator of the detailed rate expression. Itsminimum for δ_(c) in [0, snr) occurs at the positive δ_(c) which is theminimum of the two values δ_(thresh) ₁ and δ_(match), where δ_(thresh) ₁is Φ(ζ)/[ Φ(ζ)+1/snr].

Third, consider the minimization of the following combination ofcontributions to the inner code rate drop and the simplified mistakerate,δ_(mis,simp)*+(1+D(δ_(c))/snr)(1+r _(crit)/τ²)−1,for δ_(c) in [0, snr). For Φ(ζ)≦1/(1+2

) its minimum occurs at δ_(c)=0, otherwise it occurs at the positiveδ_(c) which is the minimum of the two values δ_(thresh) and δ_(match)where

$\delta_{thresh} = {\frac{{\Phi(\zeta)} - {{{\overset{\_}{\Phi}(\zeta)}/2}C}}{{1/{snr}} + {{\overset{\_}{\Phi}(\zeta)}\left( {1 + {{1/2}C}} \right)}}.}$The same conclusion holds icing δ_(mis)*=δ_(mis,simp)*/(1+ζ/τ)²,replacing the occurrences of 2

in the previous sentence with 2

(1+ζ/τ)². Finally, setΔ_(ζ,δ) _(c) =δ_(mis)*+(1+D(δ_(c))/snr)(1+r _(crit)/τ²)−1and extend the minimization to [0, snr] using the previously givenspecialized values in the δ_(c)=snr case. Then for each ζ the minimumΔ_(ζ,δ) _(c) for δ_(c) in [0, snr] is equal to the minimum over the fourvalues 0, δ_(thresh), δ_(match) and snr.

Remark:

The Δ_(ζ,δ) _(c) , when optimized also over ζ, will provide theΔ_(shape) summarized in the introduction. As shown in the next section,motivation for it arises from the total drop rate from capacity of thecomposition of the sparse superposition code with the outer Reed-Solomoncode. For now just think of it as desirable to choose parameters thatachieve a good combination of low rate drop and low fraction of sectionmistakes. As the proof here shows, the proposed combination isconvenient for the calculus of this optimization.

Recall for 0≦δ_(c)<snr that (1+D(δ_(c))/snr)(1+r_(crit)/τ²) equals(1+r₁/τ²)D(δ_(c))/snr+1+r_(crit)*/τ². In contrast, for δ_(c)=snr, setδ_(mis)*= Φ(ζ) and r_(crit)=max{r_(up),0}, using the form of r_(up)previously given for this case. These different forms arise because theg_(low) (x) bounds are used for 0≦δ_(c)<snr, whereas g(x) is useddirectly for δ_(c)=snr.

Demonstration of Lemma 26:

To determine the δ_(c) minimizing r_(crit)*, in the definition ofr_(crit)*−1−ε write the first expression (τ²+r₁)D(δ_(c)) in terms ofδ_(c) as

$\left( {\tau + \zeta} \right)^{2}{\frac{D\left( \delta_{c} \right)}{\left( {1 + \delta_{c}} \right)}.}$Take its derivative with respect to δ_(c). The ratio D(δ_(c))/(1+δ_(c))has derivative that is equal to [D′(δ_(c))(1+δ_(c))−D(δ_(c))] divided by(1+δ_(c))². Now from the form of D(δ_(c)), its derivative D′(δ_(c)) islog(1+δ_(c)), so the expression in brackets simplifies to δ_(c), whichis nou-negative, and multiplying by the positive factor(τ+ζ)²/(1+δ_(c))² provides the desired derivative. Thus this firstexpression is increasing in δ_(c), strictly so for δ_(c)>0. As for thesecond expression in the maximum, it is equal to the first expressiontimes Φ(ζ) plus r₁+ψ−1 times Φ(ζ). So from the relationship of r₁ andδ_(c), its derivative is equal to [δ_(c) Φ(ζ)−Φ(ζ)] times the same(τ+ζ)²/(1+δ_(c))². So the value of the derivative of the firstexpression is larger than that of the second expression, and accordinglythe maximum of the two expressions equals the first expression forδ_(c)≧δ_(match) and equals the second expression for δ_(c)<δ_(match).The derivative of the second expression, being the multiple of [δ_(c)Φ(ζ)−Φ(ζ)] is initially negative so that the expression is initialingdecreasing, up to the point δ_(thresh) ₀ =Φ(ζ)/ Φ(ζ) at which thederivative of this second expression is 0, so the optimizer of r_(crit)*occurs at the smallest of the three values δ_(match), δ_(thresh), andthe right end point snr of the interval of consideration.

To minimize (1+D(δ_(c))/snr)(1+r_(crit)/τ²)−1, multiplying through by τ²recall that it equals (τ²+r₁)D(δ_(c))/snr+r_(crit)* for 0≦δ_(c)<snr. Addto the previous derivative values the amount δ_(c)/snr, which is againmultiplied by the same factor (τ+ζ)²/(1+δ_(c))₂. The first expression isstill increasing. The second expression, after accounting for thatfactor, has derivativeδ_(c) /snr+δ _(c) Φ(ζ)−Φ(ζ).It is still initially negative and hits 0 at δ_(thresh) ₁ =Φ(ζ)/[Φ(ζ)+1/snr], which is again the minimizer if it occurs beforeδ_(match)Otherwise, if δ_(match) is smaller than δ_(thresh) _(.) then,since to the right of δ_(match) the maximum equals the increasing firstexpression, it follows that δ_(match) is the minimizer.

Next determine the minimizer of the criterion that combines the ratedrop contribution with the simplified section mistake contributionδ_(mis,simp)*. Multiplying through by τ², added the quantity(τ²+r₁)D(δ_(c))−r₁+1 times Φ(ζ)/2

plus the amount (2τ+(ζ)φ(ζ)/2

not depending on δ_(c). So its derivative adds the expression (δ_(c)+1)Φ(ζ)/2

times the same the factor (τ+ζ)²/(1+δ_(c))². Thus, when the first partof the max is active, the derivative, after accounting for that factor,isδ_(c)+δ_(c) /snr+(1+δ_(c)) Φ(ζ)/2

,whereas, when the second part of the max is active it isδ_(c) Φ(ζ)−Φ(ζ)+δ_(c) snr+(1+δ_(c)) Φ(ζ)/2

Again the first of these is positive and greater than the second. Wherethe value δ_(c) is relative to _(δ) _(match) determines which part ofthe max is active. For δ_(c)<δ_(match) it is the second. Initially, atδ_(c)=0, it is−Φ(ζ)+ Φ(ζ)/2

,which is (1/2

)[1−Φ(ζ)(1+2

)]. If ζ is small enough that Φ(ζ)≦1/(1+2

), this is at least 0. Then the criterion is increasing to the right ofδ_(c)=0, whence δ_(c)=0 is the minimizer. Else if Φ(ξ)<1/(1+2

) then initially, the derivative is negative and the criterion isinitially decreasing. Then as before the minimum value is either atδ_(thresh) or at δ_(match) whichever is smallest. Here δ_(thresh) is thepoint where the function based on the second expression in the maximumhas 0 derivative. The same conclusions hold withδ_(mis)*=δ_(mis,simp)*/(1+ζ/τ²) in place of δ_(mis,simp) except that thedenominator 2

is replaced with 2

(1+ζ/τ²). Examining δ_(thresh) ₁ and δ_(thresh), it is seen that theseare less than snr. Nevertheless, when minimizing over [0, snr], theminimum can arise at snr because of the different form assigned to theexpressions in that case. Accordingly the minimum of Δ_(ζ,δ) _(c) forδ_(c) [0, snr] is equal to the minimum over the four values 0,δ_(thresh), δ_(match) and snr, referred to as the optimum δ_(c) quartet.This completes the demonstration of Lemma 26.

Remark:

To be explicit as to the form of Δ_(ζ,δ) _(c) with δ_(c)=snr, recallthat in this case 1+r_(up)/τ² is(1−snr Φ(ζ))(1+ζ/τ)²/(1+snr).Consequently Δ_(ζ,δ) _(c) =δ_(mis)*+(1+D(δ_(c))/snr)(1+r_(crit)/τ²)−1,in this δ_(c)=snr case, becomes

${{\overset{\_}{\Phi}(\zeta)} + {\frac{\left( {1 + {{D({snr})}/{snr}}} \right)}{\left( {1 + {snr}} \right)}\left( {1 + {{snr}{\overset{\_}{\Phi}(\zeta)}}} \right)\left( {1 + {\zeta/\tau}} \right)^{2}} - 1},$when r_(up)≧0. For r_(up)<0 as is true for sufficiently smallcontributions from snr Φ(ζ) and ζ/τ, simply set r_(crit)=0 to avoidcomplications from the conditions of Corollary 19. Then Δ_(ζ,ξ) _(c)becomesΦ(ζ)+D(snr)/snr.9.7 Inequalities for ψ, γ, and {tilde over (r)}_(crit):

At δ_(c)=δ_(match), the r_(crit)* is examined further. Previously, inLemma 25 the expression {tilde over (r)}_(crit)*/τ² is shown to be lessthan γ²/[2(1+ζ/τ)²]. Now this bound is refined in the cases of negativeand positive ζ. For negative ζ it is shown that γ≦2/τ|ζ| and forpositive |ζ| it is shown that {tilde over (r)}_(crit)* is not more thanmax{2(ζ′)², ψ−1}. For sufficiently positive ζ it is not more than 2ζ².

Recall that γ is less than

$\frac{\left( {2 + {\zeta/\tau}} \right)\left( {\zeta + {{\phi(\zeta)}/{\Phi(\zeta)}}} \right)}{\tau}$and that ψ=(2τ+ζ)φ(ζ)/Φ(ζ).

Lemma 27.

Inequalities for negative ζ. For −τ<ζ≦0, the γ is an increasing functionless than min{2/|ζ|,4/√{square root over (2π)}}/τ. Likewise the functionψ is less than 2(|ζ|+1/|ζ|)τ.

Demonstration of Lemma 27:

For ζ≦0, the increasing factor 2 ζ/τ is less than 2 and the factorζ+φ(ζ)/Φ(ζ) is non-negative, increasing, and less than 1/|ζ| by thenormal tail inequalities in the appendix. At ζ=0 this factor is2/√{square root over (2π)}. As for ψ the factor φ(ζ)/Φ(ζ) is at least|ζ| and not more than |ζ|+1/|ζ| for negative ζ again by the normal tailinequalities in the appendix (where improvements are given, especiallyfor 0≦|ζ|≦1). This completes the demonstration of Lemma 27.

Now turn attention to non-negative ζ. Three bounds on {tilde over(r)}_(crit)* are given. The first based on γ²/2 and the other two moreexacting to determine the relative effects of 2(ζ′)² and ψ−1.

Corollary 28.

For ζ≧0 it holds that {tilde over (r)}_(crit)*/τ²≦γ²/2 and{tilde over (r)} _(crit)*≦2(ζ+φ(ζ)/Φ(ζ))².

Demonstration of Corollary 28:

By Lemma 25, the {tilde over (r)}_(crit)*/τ² is not more thanγ²/[2(1+ζ/τ)²]. Now γ is not more than

$\frac{2\left( {1 + {{\zeta/2}\tau}} \right)\left( {\zeta + {{\phi(\zeta)}/{\Phi(\zeta)}}} \right)}{\tau}$Consequently, {tilde over (r)}_(crit)* is not more than

$\frac{2\left( {1 + {{\zeta/2}\tau}} \right)^{2}\left( {\zeta + {{\phi(\zeta)}/{\Phi(\zeta)}}} \right)^{2}}{\left( {1 + {\zeta/\tau}} \right)^{2}}.$Using 1+ζ/2τ not more than 1+ζ/τ, completes the demonstration of Lemma28.

Lemma 29.

Direct r_(crit)* bounds. Let {tilde over (r)}_(crit)*=r_(crit)*−1−εevaluated at δ_(match). Bounds are provided depending on whetherD(2ζ′/τ) or (ψ−1)/τ² is larger. In the case D(2ζ′/τ)≧(ψ−1)/θ² the {tildeover (r)}_(crit)* satisfies{tilde over (r)} _(crit)*/τ² ≦D(2ζ′/τ).In any case, the value of {tilde over (r)}_(crit)*/τ² may be representedas an average of D(2ζ′/τ) and (ψ−1)/τ² plus small excess, where theweight assigned to (ψ−1)/τ² is proportional to the small 2ζ′/τ. Indeed{tilde over (r)}_(crit)*/τ² equals

$\frac{{D\left( {2{\zeta^{\prime}/\tau}} \right)} + {\frac{\left( {\psi - 1} \right)}{\tau^{2}}2{\zeta^{\prime}/\tau}}}{1 + {2{\zeta^{\prime}/\tau}}} + {excess}$where excess is e^(−υ)−(1−υ) evaluated at

$\upsilon = {\frac{\frac{\psi - 1}{\tau^{2}} - {D\left( {2{\zeta^{\prime}/\tau}} \right)}}{1 + {2{\zeta^{\prime}/\tau}}}.}$In the case (ψ−1)/τ²>D(2ζ′/τ) it satisfies

${excess} \leq {\frac{\left\lbrack {\frac{\psi - 1}{\tau^{2}} - {D\left( {2{\zeta^{\prime}/\tau}} \right)}} \right\rbrack^{2}}{2\left( {1 + {\zeta/\tau}} \right)^{4}}.}$

Demonstration of Lemma 29

With the relationship between τ₁ and δ_(c), recall (τ²+r₁)D(δ_(c)) isincreasing in δ_(c) and hence decreasing in r₁. The r₁ that provides thematch makes (τ²+r₁)D(δ_(c)) equal r₁+ψ−1. At r₁=0, the first isτ²D(2ζ′/τ), so if that be larger than ψ−1 then a positive r₁ is neededto bring it down to the matching value. Then {tilde over (r)}_(crit)* isless than τ²D(2ζ′/τ). Whereas if τ²D(2(ζ′/τ) is less than ψ−1 thenr_(crit)* is greater than D(2ζ/τ), but not by much as shall be seen. Inany case, write {tilde over (r)}_(crit)*/τ² as

$\frac{r_{1}}{\tau^{2}} + \frac{\psi - 1}{\tau^{2}}$which by Lemma 25 is

$\frac{\psi - 1}{\tau^{2}} + {\left( {1 + {\zeta/\tau}} \right)^{2}{\mathbb{e}}^{{- \gamma}/{({1 + {\zeta/\tau}})}^{2}}} - 1.$Use γ=2ζ′/τ(ψ−1)/τ² and for this proof abbreviate a=(ψ−1)/τ² andb=2ζ′/τ. The exponent γ/(1+ζ/τ)² is then (a+b)/(1+b) and the expressionfor r_(crit)*/τ² becomesa+(1+b)e ⁻⁽ a+b)/(1+b)−1.Add and subtract D(b) in the numerator to write (a+b)/(1+b) as(a−D(b))/(1+b) plus (b+D(b))(1+b), where by the definition of D(b) thelatter term is simply log(1+b) which leads to a cancellation of the 1+boutside the exponent. So the above expression becomesa+e ^((a−D(b))/(1+b))−1,which is a+e^(−υ)−1=a−υ+excess, where excess=e^(−υ)−(1−υ) andυ=(a−D(b))/(1+b). For a≧D(b), that is, υ≧0, the excess is less thanυ²/2, by the second order expansion of e^(−υ), since the secondderivative is bounded by 1, which provides the claimed control of theremainder. The a−υ may be written as [D(b)+ba]/(1+b) the average of D(b)and a with weights 1/(1+b) and b/(1+b), or equivalently asD(b)+b(a−D(b))/(1+b). Plugging in the choices of a and b completes thedemonstration of Lemma 29.

An implication when ζ and ψ−1 are positive, is that r_(crit)*/τ² is notmore than

${D\left( {2{\zeta^{\prime}/\tau}} \right)} + \frac{2{\zeta^{\prime}\left( {\psi - 1} \right)}}{\tau^{3}} + {\frac{\left( {\psi - 1} \right)^{2}}{\tau^{4}}.}$

This bound, and its sharper form in the above lemma, shows that {tildeover (r)}_(crit)*/τ² is not much more than D(2ζ′/τ), which in turn isless than 2(ζ′)²/τ², near 2ζ²/τ².

Also take note of the following monotonicity property of the functionψ(ζ) for ζ≧0. It uses the fact that τ≧1. Indeed, τ≧√{square root over (2log B)} is at least √{square root over (2 log 2)}=1.18.

Lemma 30.

Monotonicity of ψ: With τ≧1.0, the positive functionψ(z)=(2τ+z)φ(z)/Φ(z) is strictly decreasing for z≧0. Its maximum valueis ψ(0)=4τ/√{square root over (2π)}≦1.6τ. Moreover γ=2ζ′/τ+(ψ−1)/τ² ispositive.

Demonstration of Lemma 30

The function ψ(z) is clearly strictly positive for z≧0. Its derivativeis seen to beψ′(z)=−[ψ(z)+z(2τ−z)−1]φ(z)/Φ(z).

Note that the function τ²γ(z) matches the expression in brackets, thisderivative equals−τ²γ(z)φ(z)/Φ(z).The τ²γ(z) is at least ψ(z)+2τz−1, and it remains to show that it ispositive for all z≧0. It is clearly positive for z≧1/2τ. For 0≦z≦1/2τ,lower bound it by lower bounding ψ(z)−1 by 2τφ(1/2τ)/Φ(1/2τ)−1, which ispositive provided 1/2τ is less that the unique point z=z_(root)>0 whereφ(z)=zΦ(z). Direct evaluation shows that this z is between 0.5 and 0.6.So τ≧1.0 suffices for the positivity of γ(z) and equivalently thenegativity of ψ′(z) for all z≧0. This completes the demonstration ofLemma 30.

The monotonicity of ψ(ζ) is associated with decreasing shortfall δ*, asζ is increased, though with the cost of increasing r_(crit)*. Evaluatingr_(crit)* as a function of ζ enables control of the tradeoff.

Remark: The rate R=

/(1−r/τ²) has been parameterized by r. As stated in Lemma 24, therelationship between the gap and r, expressed asgap=(r−r_(crit))/[snr(τ²+r₁)], may also be writtenr=r_(crit)+snr(τ²+r₁)gap. Recall also that one may set gap=η+ f+1/(m−1),with f=mf*ρ. In this way, the rate parameter r is determined from thechoices of ζ that appear in r_(crit) as well as from the parameters m, fand η that control, respectively, the number of steps, the fractions offalse alarms, and the exponent of the error probability.

The importance of ζ in this section is that provides for the evaluationof r_(crit) and through r₁ it controls the location of the upper end ofthe region in which g(x)−x is shown to exceed a target gap. For any ζ,the above remark conveys the smallest size rate drop parameter r forwhich that gap is shown to be achieved.

In the rate representation R, draw attention to the product of two ofthe denominator factors (1+D(δ_(c))/snr)(1+r/τ²). Here below thesefactors are represented in a way that exhibits the dependence onr_(crit)* and the gap.

Using r equal to r_(crit)+snr gap τ²(1−r₁/τ²) write the factor 1+r/τ² asthe product (1+r_(crit)/τ²)(1+ξsnr gap) where ξ is the ratio(1+r₁/τ²)/(1+r_(crit)/τ²), a value between 0 and 1, typically near 1.Thus the product (1+D(δ_(c))/snr)(1−r/τ²) takes the form(1+D(δ_(c))/snr)(1+r _(crit)/τ²)+(1+ξsnr gap).Recall that (1+D(δ_(c))/snr)(1+r_(crit)/τ²) is equal to

$1 + \frac{\left( {1 + {r_{1}/\tau^{2}}} \right){D\left( \delta_{c} \right)}}{snr} + \frac{r_{crit}^{*}}{\tau^{2}}$which, at r₁=r_(1,match), is equal to

$1 + \frac{{\overset{\sim}{r}}_{crit}^{*}}{{snr}\;\tau^{2}} + {\frac{{\overset{\sim}{r}}_{crit}^{*} + 1 + \varepsilon}{\tau^{2}}.}$So in this way these denominator factors are expressed in terms of thegap and {tilde over (r)}_(crit)*, where {tilde over (r)}_(crit)* is near2ζ² by the previous corollary.

Complete this subsection by inquiring whether r_(crit) is positive forrelevant ζ. By the definition of r_(crit), its positivity is equivalentto the positivity of r_(crit)*+r₁D(δ_(c))/snr which is not less than1+ε+(τ²+r₁)D(δ_(c))+r₁D(δ_(c))/snr. The multiplier of D(δ_(c)) is(τ²+r₁)(1+1/snr) which is positive for r₁≧−τ²snr/(1+snr). So it is askedwhether that be a suitable lower bound on r₁. Recall the relationshipbetween x* and r₁,

${1 - x^{*}} = {\frac{r - r_{1}}{{snr}\left( {\tau^{2} + r_{1}} \right)} = {{gap} + {\frac{r_{crit} - r_{1}}{{snr}\left( {\tau^{2} + r_{1}} \right)}.}}}$Recognizing that r_(crit)−r₁ equals r_(crit)*−r₁ divided by1+D(δ_(c))/snr, expressing D(δ_(c)) in terms of r_(crit)* and r₁ asabove, one can rearrange this relationship to reveal the value of r₁ asa function of x*+gap and r_(crit)*. Using r_(crit)*>0 one finds that theminimal r₁ to achieve positive x*+gap is indeed greater than −τ²snr/(1+snr).9.8 Determination of ζ:

In this subsection solve, where possible, for the optimal choice of ζ inthe expression Δ_(ζ,δ) _(c) which balances contributions to the ratedrop with the quantity δ_(mis)* related to the mistake rate. As above itisΔ_(ζ,δ) _(x) =δ_(mis)*+(1+D(δ_(c))/snr)(1+r _(crit)/τ²)−1.Also work with the simplified form Δ_(ζ,δ) _(c) _(,simp) in whichδ_(mis,simp)* is used in place of δ_(mis)*. For 0≦δ_(c)<snr, thisΔ_(ζ,δ) _(c) coincides, as seen herein above, with

${\frac{{\left( {{2\tau} + \zeta} \right){\phi(\zeta)}} + {rem}}{2\; C\;{\tau^{2}\left( {1 + {\zeta/\tau}} \right)}^{2}} + \frac{\left( {1 + {r_{1}/\tau^{2}}} \right){D\left( \delta_{c} \right)}}{snr} + \frac{r_{crit}^{*}}{\tau^{2}}},$where rem=[(τ²+r₁)D(δ_(c))(r₁−1)] Φ(ζ). Define Δ_(ζ,δ) _(c) _(,simp) tobe the same but without the (1+ζ/τ)² in the denominator of the firstpart.

Seek to optimize Δ_(ζ,δ) _(c) or Δ_(ζ,δ) _(c) _(,simp) over choices of ζfor each of the quartet of choices of δ_(c) given by 0, δ_(thresh),δ_(match) and snr. The minimum of Δ_(ζ,δ) _(c) provides what is denotedas Δ_(shape) as summarized in the introduction.

Optimum or near optimum choices for ζ are provided for the cases ofδ_(c) equal to 0, δ_(match), and snr, respectively. These providedistinct ranges of the signal to noise ratio for which these casesprovide the smallest Δ_(ζ,δ) _(c) . At present, the inventors hereinhave not been able to determine whether the minimum Δ_(ζ,δ) _(thresh)has a range of signal to noise ratios at which its minimum is superiorto what is obtained with the best of the other cases. What the inventorscan confirm regarding δ_(thresh) is that for small snr themin_(ζ)Δ_(ζ,δ) _(thresh) requires δ_(thresh) near snr, and that for snrabove a particular constant, the minimum Δ_(ζ,δ) _(thresh) matchesmin_(ζ)Δ_(ζ,0) with δ_(thresh)=0 at the minimizing ζ.

Optimal choices for ζ for the cases of δ_(c) equal to 0, δ_(match), andsnr, respectively, provide three disjoint intervals R₁, R₂, R₃ of signalto noise ratio. The case of δ_(c)=0 provides the optimum for the highend of snr in R₃; the case of δ_(c)=δ_(match) provides the best boundsfor the intermediate range R₂; and the case of δ_(c)=snr provides theoptimum for the low snr range R₁.

The tactic is to consider these choices of δ_(c) separately, eitheroptimizing over ζ to the extent possible or providing reasonably tightupper bounds on min_(ζ)Δ_(ζ,δ) _(c) , and then inspect the results tosee the ranges of snr for which each is best.

Note directly that Δ_(ζ,δ) _(c) is a decreasing function of snr for theδ_(c)=0 and δ_(c)=δ_(match) cases, so min_(ζ)Δ_(ζ,δ) _(c) also bedecreasing in snr. Likewise for Δ_(ζ,δ) _(c) _(,simp).

Remember that log base e is used, so the capacity is measured in mats.

Lemma 31.

Optimization of Δ_(ζ,δ) _(c) _(,simp) with δ_(c)=0. At δ_(c)=0, theΔ_(ζ,0,simp) is optimized at the 1/(2

+1) quantile of the standard normal distributionζ=ζ_(C)=Φ⁻¹(1/(2

+1)).If

>1/2 this ζ_(C) is less than or equal to 0 and min_(ζ)Δ_(ζ,0,simp) isnot more than

$\frac{\left( {{2\; C} + 1} \right){\phi\left( {\zeta\; c} \right)}}{C\;\tau} + {\frac{1}{\tau^{2}}.}$Dividing the first term by (1+ζ_(C)/τ)² gives an upper bound onmin_(ζ)Δ_(ζ,0) valid for ζ_(C)>−τ. The bound is decreasing in

when ζ_(C)>−τ+1. Let

, exponentially large in τ²/2, be such that ζ_(C) _(large) =−τ+1. For

>

_(large), use ζ=−τ+1 in place of ζ_(C), then the first term of thisbound is exponentially small in τ²/2 and hence polynomially small in1/B.

This the ζ=ζ_(C)* advocated for δ_(c)=0 isζ_(C)*=max{ζ_(C),−τ+1}.Examination of the bound shows an implication of this Lemma. When

is large compared to τ the Δ_(shape) is near 1/τ². This is clarified inthe following corollary which provides slightly more explicit bounds.

Corollary 32.

Bounding min Δ_(ζ,0) with δ_(c)=0. To upper bound min_(ζ)Δ_(ζ,0,simp),the choice ζ=0 provides

${\frac{\left( {{2\; C} + 1} \right)}{C}\left( {\frac{1}{\tau\sqrt{2\pi}} + \frac{1}{4\tau^{2}}} \right)},$which also bounds min_(ζ)Δ_(ζ,0). Moreover, when

≧1/2, the optimum ζ_(C) satisfies |ζ_(C)|≦

and provides the following bound, which improves on the ζ=0 choice when

≧2.2,

${\frac{\xi\left( {{\zeta\; c}} \right)}{C\;\tau} + \frac{1}{\tau^{2}}},$not more than

${\frac{\xi\left( \sqrt{2\;{\log\left( {C + {1/2}} \right)}} \right)}{C\;\tau} + \frac{1}{\tau^{2}}},$where ξ(z) equals z+/z for z≧1 and equals 2 for 0<ζ<1. Dividing thefirst term by (1+ζ_(X)/τ)² gives an upper bound on min_(ζ)Δ_(ζ,0) of

$\frac{\xi\left( {{\zeta\; c}} \right)}{C\;{\tau\left( {1 + {\zeta\;{c/\tau}}} \right)}^{2}} + {\frac{1}{\tau^{2}}.}$When B>1+snr, this bound on min_(ζ)Δ_(ζ,0) improves on the bound withζ=0, for

≧5.5. As before, when

≧

_(large) for which ζ_(C) _(large) =−τ+1, use the bound with

_(large) in place of

.

The min_(ζ)Δ_(ζ,0,simp) bound above is smaller than given below formin_(ζ,δ) _(match) _(,simp), when the snr is large enough that anexpression of order

/(log

)^(3/2) exceeds τ.

The quantity d=d_(snr)=2

/ν=(1+1/snr) log(1+snr) has a role in what follows. It is an increasingfunction of snr, with value always at least 1.

For 2

/ν≧τ/√{square root over (2π)} use non-positive ζ, whereas for 2

/ν<τ/√{square root over (2π)} use positive ζ. Thus the discriminant ofwhether to use positive ζ is the ratio ψ=d/τ and whether it is smallerthan 1/√{square root over (2π)}. This ratio ω is

$\omega = {\frac{d}{\tau} = {\frac{2\; C}{v\;\tau}.}}$

In the next two lemma use δ_(c)=δ_(match). Using the results of Lemma 25and 1+1/snr=1/ν, the form of Δ_(ζ,δ) _(match) simplifies to

$\frac{\psi}{2\; C\;{\tau^{2}\left( {1 + {\zeta/\tau}} \right)}^{2}} + {\frac{1}{v}\frac{{\overset{\sim}{r}}_{crit}^{*}}{\tau^{2}}} + {\frac{1 + \varepsilon}{\tau^{2}}.}$

Recall for negative ζ that ψ is near 2τ|ζ| and {tilde over (r)}_(crit)*through γ²/2 is near 2/|ζ|², with associated bounds given in Lemma 27.So it is natural to set a negative ζ that minimizes −τζ/

+2/(νζ²τ²) for which the solution isζ=−(4

/ντ)^(1/3)=−(2ω)^(1/3),which is here denoted as ζ_(1/3).

Lemma 33.

Optimization of Δ_(ζ,δ) _(c) at δ_(c)=δ_(match): Bounds fromnon-positive ζ. The choice of ζ=0 yields the upper bound onmin_(ζ)Δ_(ζ,δ) _(match) of

$\frac{2}{C\;\tau\sqrt{2\pi}} + \frac{4}{v\;\tau^{2}\pi} + {\frac{1 + \varepsilon}{\tau^{2}}.}$As for negative ζ, the choice ζ=ζ_(1/3)=−(4

/ντ)^(1/3) yields the upper bound on min_(ζ)Δ_(ζ,δ) _(match) of

${\frac{1}{\left( {1 + {\zeta_{1/3}/\tau}} \right)^{2}}\left( {\frac{2.4}{v^{1/3}C^{2/3}\tau^{4/3}} + \frac{2}{{\zeta_{1/3}}C\;\tau}} \right)} + {\frac{1 + \varepsilon}{\tau^{2}}.}$

Amongst the bounds so far with ζ≦0, the first term controlling δ_(mis)*is smallest at ζ=0 where it is 2/[

τ√{square root over (2π)}]. The advantage of going negative is that thenthe 4/[ντ²π] term is replaced by terms that are smaller for large

.

Comparison:

The two bounds in Lemma 33 may be written as

${\frac{4}{v\;\tau^{2}}\left\lbrack {\frac{1}{\sqrt{2\pi}\omega} + \frac{1}{\pi}} \right\rbrack} + \frac{1 + \varepsilon}{\tau^{2}}$and${{\frac{4}{v\;\tau^{2}}\left\lbrack {\frac{1.5}{\left( {2\omega} \right)^{2/3}} + \frac{2}{\left( {2\omega} \right)^{4/3}}} \right\rbrack} + \frac{1 + \varepsilon}{\tau^{2}}},$respectively, neglecting the (1+ζ_(1/3)/τ) factor. Numerical comparisonof the expressions in the brackets reveals that the former, from ζ=0, isbetter for ω<5.37, while the later from ζ=ζ_(1/3) is better for ω≧5.37,which is for |ζ_(1/3)|≧2.2.

Next compare the leading term of the bound ζ_(1/3) and δ_(c)=δ_(match)to the corresponding part of the bound using ζ_(c) and δ_(c)=0. Theseare, respectively,

$\frac{2.4}{v^{1/3}C^{2/3}\tau^{4/3}}$ and$\frac{\xi\left( {{\zeta\; c}} \right)}{C\;\tau}.$

From this comparison the δ_(c)=0 solution is seen to be better when

$\frac{4.5\; C}{{v\left( {\xi\left( {{\zeta\; c}} \right)} \right)}^{3}} > {\tau.}$Modified to take into account the factors 1+∂_(1/3)/τ and 1+ζ_(c)*/τ,this condition defines the region R₃ of very large snr for which δ_(c)=0is best. To summarize it corresponds to snr large enough that anexpression near 4.5

/(log

)^(3/2) exceeds τ, or, equivalently, that

is at least a value of order τ(log τ)^(3/2), near to(τ/4.5)(log(τ/4.5))^(1/5), for sufficient size τ.

Next consider the case of ω=d/τ less than 1/√{square root over (2π)} forwhich positive ζ is used. The function φ(ζ)/Φ(ζ) is strictly decreasing.From its inverse, let ζ_(ω) be the unique value at which φ(ζ)/Φ(ζ)=2ω.It is used to provide a tight bound on the optimal Δ_(ζ,δ) _(match) .

Lemma 34.

Optimization of Δ_(ζ,δ) _(c) at δ_(c)=δ_(match): Bounds from positive ζ.Consider the case that τ/√{square root over (2π)}≧2

/ν. Let ω=2

/ντ. The choice of ζ=ζ_(ω)yields Δ_(ζ,δ) _(match) not more than

${\frac{2}{v\;\tau^{2}}\left\lbrack {2 + \left( {\zeta_{\omega} + {2\omega}} \right)^{2}} \right\rbrack} + {\frac{1 + \varepsilon}{\tau^{2}}.}$This ζ_(ω) is not more than

$\zeta_{\omega}^{*} = \sqrt{2\mspace{11mu}{\log\left( {{1/2} + {{1/2}\omega\sqrt{2\pi}}} \right)}}$which is

$\sqrt{2\mspace{11mu}{\log\left( {\frac{1}{2} + \frac{\tau\; v}{4{??}\sqrt{2\pi}}} \right)}},$at which Δ_(ζ,δ) _(match) is not more than

${\frac{2}{v\;\tau^{2}}\left\lbrack {2 + \left( {\sqrt{2{\log\left( {\frac{1}{2} + \frac{\tau\; v}{4{??}\sqrt{2\pi}}} \right)}} + \frac{4{??}}{\tau\; v}} \right)^{2}} \right\rbrack} + {\frac{1 + \varepsilon}{\tau^{2}}.}$

For small d/τ the 2ω=4

/ντ=2d/τ term inside the square is negligible compared to the log term.Then the bound is near

${\frac{4}{v\;\tau^{2}}\left\lbrack {1 + {\log\left( {\frac{1}{2} + \frac{\tau}{2d\sqrt{2\pi}}} \right)}} \right\rbrack} + {\frac{1}{\tau^{2}}.}$In particular if snr is small the d=2

/ν is near 1 and the bound is near

${\frac{4}{v\;\tau^{2}}\left\lbrack {1 + {\log\left( {\frac{1}{2} + \frac{\tau}{2\sqrt{2\pi}}} \right)}} \right\rbrack} + {\frac{1}{\tau^{2}}.}$

Finally, consider the case δ_(c)=snr. The following lemma uses the formof Δ_(ζ,snr) given in the remark following Lemma 26.

Lemma 35.

Optimization of Δ_(ζ,δ) _(c) at δ_(c)=snr. The Δ_(ζ,snr) is the maximumof the expressions

${\overset{\_}{\Phi}(ϛ)} + {\frac{\left( {1 + {{D({snr})}/{snr}}} \right)}{\left( {1 + {snr}} \right)}\left( {1 + {{snr}{\overset{\_}{\Phi}(ϛ)}}} \right)\left( {1 + {ϛ/\tau}} \right)^{2}} - 1$and ${\overset{\_}{\Phi}(ϛ)} + {{D({snr})}/{{snr}.}}$The first expression in this max is approximately of the form bΦ(ζ)+2ζ/τ+c, optimized at

${ϛ = \sqrt{2\mspace{11mu}{\log\left( {{{\tau\left( {1 + {2{??}}} \right)}/2}\sqrt{2\pi}} \right)}}},$where b=1+2

and c is equal to the negative value (1/snr)log(1+snr)−1, at whichΦ(ζ)≦φ(ζ)=2/(τb). This yields a bound for that expression near

${\frac{2}{\tau} + \frac{2\sqrt{2\mspace{11mu}{\log\left( {{{\tau\left( {1 + {2{??}}} \right)}/2}\sqrt{2\pi}} \right)}}}{\tau} + c},$with which one takes the maximum of it and

$\frac{2}{\tau\left( {1 + {2{??}}} \right)} + {\frac{D({snr})}{snr}.}$

Recall that D(snr)/snr≦snr/2. Because of the D(srr)/snr term theΔ_(ζ,snr) is small only when snr is small. In particular Δ_(ζ,snr) isless than a constant time √{square root over (log τ)}/τ when snr is lessthan such.

In view of the ν=snr/(1+snr) factor in the denominator of Δ_(ζ,δ)_(match) , one sees that min_(ζ)Δ_(ζ,snr) provides a better bound thanΔ_(ζ,δ) _(match) for snr less than a constant times √{square root over(log τ)}/τ.

Demonstration of Lemma 31 and its Corollary:

This Lemma concerns the optimization of ζ in the case δ_(c)=0. In thiscase 1+r₁/τ²=(1+ζ/τ)², the role of r_(crit)* is played by r_(up) and thevalue of Δ_(ζ,0,simp) is

${\frac{1}{2{??}}\frac{{\left( {{2\tau} + ϛ} \right){\phi(ϛ)}} + {rem}}{\tau^{2}}} + {\frac{r_{1} + {\left( {{2\tau} + ϛ} \right){\phi(ϛ)}} + {rem}}{\tau^{2}}.}$Here rem=−(r₁−1) Φ(ζ), with r₁=2ζτ+ζ². Direct evaluation at ζ=0 gives abound, at which r₁=0 and rem=1/2.

Let's optimize Δ_(ζ,0,simp) for the choice of ζ. The derivative of(2τ+ζ)φ(ζ)+rem with respect to ζ is seen to simplify to −2(τ+ζ) Φ(ζ).Accordingly, Δ_(ζ,0,simp) has derivative

${2\left( {\tau + ϛ} \right)\left( {1 - {\left( {\frac{1}{2{??}} + 1} \right){\overset{\_}{\Phi}(ϛ)}}} \right)},$which is 0 at ζ solving Φ(ζ)=2

/(2

+1), equivalently, Φ(ζ=1/(2

+1). At this ζ, the quantities multiplying r₁ including the parts fromthe two occurrences of the remainder remainder are seen to cancel, suchthat the resulting value of Δ_(ζ,0,simp) is

$\frac{{\left( {1 + {{1/2}{??}}} \right)\left( {{2\tau} + {ϛ\; c}} \right){\phi\left( {ϛ\; c} \right)}} + 1}{\tau^{2}}.$With 2

>1, this ζ=ζ_(C) is negative, so Δ_(ζ,0,simp) is not more than

$\frac{\left( {{2{??}} + 1} \right){\phi\left( {ϛ\; c} \right)}}{??\tau} + {\frac{1}{\tau^{2}}.}$Per the inequality in the appendix for negative ζ, the φ(ζ_(C)) is notmore than the value) ξ(|ζ_(C)|)Φ(ζ)=ξ(|ζ_(C)|)/(2

+1), with ξ(|ζ|) the nondecreasing function equal to 2 for |ζ|≦1 andequal to |ζ|+1/|ζ| for |ζ| greater than 1. So at ζ=ζ_(C), theΔ_(ζ,0,simp) is not more than

${\frac{\xi\left( {{ϛ\; c}} \right)}{??\tau} + \frac{1}{\tau^{2}}},$where from 1/(2

+1)=Φ(ζ_(C))≦(½)e^(−ζ) ^(C) ² ^(/2) it follows that |ζ_(C)|≦

The coefficient ξ

improves on the (2

+1)/√{square root over (2π)} from the ζ=0 case when (2

+1)/2 is less than the value υal for which ξ(√{square root over (2 logυal)})=(2/√{square root over (2π)})υal. Evaluations show υal to bebetween 2.64 and 2.65. So it is an improvement when 2

≧2υal−1=4.3, and

≧2.2 suffices. The improvement is substantial for large

.

Dividing the first term by (1+ζ_(C)/τ)² produces an upper bound onΔ_(ζ,0) when ζ_(C)>−τ. Exact minimization of Δ_(ζ,0) is possible, thoughit does not provide an explicit solution. Accordingly, instead use theζ_(C) that optimizes the simpler form and explore the implications ofthe division by (1+ζ_(C)/τ)².

Consider determination of conditions on the size

such that the bound on min_(ζ)Δ_(ζ,0) is an improvement over the ζ=0choice. One can arrange the |ζ_(C)|/τ to be small enough that the factor(1+ζ_(C)/τ)² in the denominator remains sufficiently positive. Atζ=ζ_(C), the bound on ζ_(C)| of

is kept less than τ=√{square root over (2 log B)}(1+δ_(a)) when B isgreater than

, and |ζ_(C)|/τ is kept small if B is sufficiently large compared to

.

In particular, suppose B≧l+snr, then τ²/4 is at least

=(1/2)log(1+snr), that is,

, and (1+ζ/τ) is greater than 1−

, which is positive for all

≧1/2. Then for the non-zero ζ_(C) bound on Δ_(ζ,0) to provideimprovement over the ζ=0 bound it is sufficient that

be at least the value

₀ at which (2

+1)/√{square root over (2π)} equals ξ(

divided by [1−2

]². Numerical evaluation reveals that

₀ is between 5.4 and 5.45.

Next, consider what to do for very large

for which τ+ζ_(C) is either negative or not sufficiently positive togive an effective bound. This could occur if snr is large compared to B.To overcome this problem, let

_(large) be the value with ζ_(Capacity) _(large) =τ+1. For

≧

_(large), use this ζ=ζ_(C) _(large) in place of ζ_(C) so that τ+ζ=1stays away from 0. Then upper bound Δ_(ζ,0) by replacing the appearanceof

with

_(large). This

_(large) has

≧|ζ|=τ−1 so that2

_(large)+1≧2e ^((τ−1)) ² ^(/2).More stringently,

${\frac{1}{{2{??}_{large}} + 1} = {{\Phi(\zeta)} = {{\Phi\left( {\tau - 1} \right)} \leq {\frac{1}{\tau - 1}{\phi\left( {\tau - 1} \right)}}}}},$from which 2

_(large)+1 is at least (τ−1)√{square root over (2π)}e^((τ−)) ² ^(/2).Then for

≧

_(large), at ζ=ζ_(C) _(large) the term

$\frac{{\tau\xi}\left( {\zeta } \right)}{{{??}\left( {\tau + \zeta} \right)}\,^{2}}$is less than

$\frac{2\tau^{2}}{{\left( {\tau - 1} \right)\sqrt{2\pi}{\mathbb{e}}^{{({\tau - 1})}^{2}/2}} - 1}$which is exponentially small in τ²/2 and hence of order 1/B to within alog factor. Consequently, for such very large

, this term is negligible compared to the 1/τ².

Finally, consider the matter of the range of

for which the expression in the first term (2

+1)φ(ζ_(C))/[

τ(1+ζ_(C)/τ)²] is decreasing in

even with the presence of the division by (1+ζ_(C)/τ)². Taking thederivative of this expression with respect to

, one finds that there is a

_(crit), with value of ζ_(C) _(crit) not much greater than −τ, such thatthe expression is decreasing for

up to

_(crit), after which, for larger

, it becomes preferable to use ζ=ζ_(C) _(crit) in place of ζ_(C), thoughthe determination of

_(crit) is not explicit. Nevertheless, one finds that at

=

_(large) where ζ_(C)=−τ+1, the derivative of the indicated expression isstill negative and hence

_(large)≦

_(crit). Thus the obtained bound is monotonically decreasing for

up to

_(large), and thereafter the bound for the first term is negligible.This completes the demonstration of Lemma 31 and its corollary.

Demonstration of Lemma 33:

Recall for negative ζ that ψ is bounded by 2τ[|ζ|+1/|ζ|]. Likewise{tilde over (r)}_(crit)*/τ² is bounded by γ²/[2(1+ζ/τ)²]. Using γ≦2/|ζ|τthis yields {tilde over (r)}_(crit)*/τ² less than 2/[ζ²(τ+ζ)²]. Pluggingin the chosen ζ=ζ_(1/3) produces the claimed bound for that case.Likewise directly plugging in ζ=0 into the terms of Δ_(ζ,δ) _(match)provides a bound for that case. This completes the demonstration ofLemma 33.

Demonstration of Lemma 34:

As previously developed, at δ_(c)=δ_(match), the form of Δ_(ζ,δ)_(match) simplifies to

$\frac{\psi}{2{{??\tau}^{2}\left( {1 + {\zeta/\tau}} \right)}^{2}} + {\frac{1}{v}\frac{{\overset{\sim}{r}}_{crit}^{*}}{\tau^{2}}} + {\frac{1 + \varepsilon}{\tau^{2}}.}$Now by Corollary 28, with ζ>0,{tilde over (r)} _(crit)*≦2(ζ+φ(ζ)/Φ(ζ))².

Also ψ(ζ)=2τ(1+ζ/2τ)φ(ζ)/Φ(ζ) and the (1+ζ/2τ) factor is canceled by thelarger (1+ζ/τ)² in the denominator. Accordingly, Δ_(ζ,δ) _(match) hasthe upper bound

$\frac{\phi(\zeta)}{{??\tau\Phi}(\zeta)} + {\frac{2}{v\;\tau^{2}}\left( {\zeta + \frac{\phi(\zeta)}{\Phi(\zeta)}} \right)^{2}} + {\frac{1 + \varepsilon}{\tau^{2}}.}$Plugging in ζ=ζ_(ω) for which φ(ζ)Φ(ζ)=2ω produces the claimed bound.

$\frac{2\omega}{??\tau} + {\frac{2}{v\;\tau^{2}}\left( {\zeta_{\omega} + {2\omega}} \right)^{2}} + {\frac{1 + \varepsilon}{\tau^{2}}.}$

To produce an explicit upper bound on Δ_(ζ,δ) _(match) replace the Φ(ζ)in the denominator with its lower bound 1−√{square root over(2π)}φ(ζ)/2, for ζ≧0. This lower bound agrees with Φ(ζ) at ζ=0 and inthe limit of large ζ. The resulting upper bound on Δ_(ζ,δ) _(match) is

$\frac{\phi\zeta}{{??\tau}\left( {1 - {\sqrt{2\pi}{{\phi(\zeta)}/2}}} \right)} + {\frac{2}{v\;\tau^{2}}\left( {\zeta + \frac{\phi(\zeta)}{\left( {1 - {\sqrt{2\pi}{{\phi(\zeta)}/2}}} \right)}} \right)^{2}}$plus  (1 + ε)/τ².

The bound on φ(ζ)/Φ(ζ) of φ(ζ)/(1−√{square root over (2π)}φ(ζ)/2) isfound to equal 2ω when √{square root over (2π)}φ(ζ) equals2/[1+1/ω√{square root over (2π)}], at which ζ=ζ* is

$\zeta^{*} = {\sqrt{2{\log\left( {\frac{1}{2} + \frac{1}{2\omega\sqrt{2\pi}}} \right)}}.}$Accordingly, this ζ* upper bounds ζ_(ω) and the resulting bound onΔ_(ζ*,δ) _(match) is

$\frac{2\omega}{??\tau} + {\frac{2}{v\;\tau^{2}}\left( {\zeta^{*} + {2\omega}} \right)^{2}} + {\frac{1 + \varepsilon}{\tau^{2}}.}$Using 2ω=4

/ντ it is

${\frac{2}{v\;\tau^{2}}\left\lbrack {2 + \left( {\zeta + {2\omega}} \right)^{2}} \right\rbrack} + {\frac{1 + \varepsilon}{\tau^{2}}.}$This completes the demonstration of Lemma 34.

To provide further motivation for the choice ζ_(ψ), the derivative withrespect to ζ of the above expression bounding Δ_(ζ,δ) _(match) for ζ≧0is seen, after factoring out (4/ν)(ζ+φ/Φ), to equal

${1 - {\frac{1}{2\omega}\frac{\phi}{\Phi}} - {\left( {\zeta + \frac{\phi}{\Phi}} \right)\frac{\phi}{\Phi}}},$where the last term is negligible if ζ is not small. The first two yield0 at ζ=ζ_(ψ). Some improvement arises by exact minimization. Set thederivative to 0 including the last term, noting that it takes the formof a quadratic in φ/Φ. Then at the minimizer, φ/Φ equals [√{square rootover ((ζ+1/2ω)²+4)}−(ζ+1/2ω)]/2 which is less than 1/(ζ+1/2ω)≦2ω.

For further understanding of the choice of ζ, note that for ζ not small,Φ(ζ) is near 1 and the expression to bound is near φ(ζ)/(

τ)+2ζ²/ντ², which by analysis of its derivative is seen to be minimizedat the positive ζ for which φ(ζ) equals 4

/ντ=2ω. It is ζ₁=

$\zeta_{1} = {\sqrt{2\log\;{1/\left( {2\omega\sqrt{2\pi}} \right)}}.}$One sees that ζ* is similar to ζ₁, but has the addition of ½ inside thelogarithm, which is advantageous in allowing ω up to 1/√{square rootover (2π)}. The difference between the use of ζ* and ζ₁ is negligiblewhen they are large (i.e. when ω is small), nevertheless, numericalevaluation of the resulting bound shows ζ* to be superior to ζ₁ for allω≦1/√{square root over (2π)}.

In the next section the rate expression is used to solve for the optimalchoices of the remaining parameters.

10 Optimizing Parameters for Rate and Exponent

In this section the parameters are determined that maximize thecommunication rate for a given error exponent. Moreover, in the smallexponent (large L) case, the rate and its closeness to capacity aredetermined as a function of the section size B and the signal to noiseratio snr.

Recall that the rate of the sparse superposition inner code is

${R = \frac{\left( {1 - h^{\prime}} \right){??}}{\left( {1 + \delta_{a}} \right)^{2}\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + {r/\tau^{2}}} \right)}},$with (1−h′)=(1−h)(1−h_(f)). The inner code makes a weighted fraction ofsection mistakes bounded by δ_(m)=δ*+η+ f with high probability, asshown previously herein. If one multiply the weighted fraction by thefactor 1/[L min_(l)π_((l))] which equals fac=snr(1+δ_(sum) ²)/[2

(1+δ_(c))], then it provides an upper bound on the (unweighted) fractionof mistakes δ_(mis)=facδ_(m) equal toδ_(mis) =fac(δ*+η+ f).So with the Reed-Solomon outer code of rate 1−δ_(mis), which correctsthe remaining fraction of mistakes, the total rate of the code is

$R_{tot} = {\frac{\left( {1 - \delta_{mis}} \right)\left( {1 - h^{\prime}} \right){??}}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + \delta_{a}} \right)^{2}\left( {1 + {r/\tau^{2}}} \right)}.}$This multiplicative representation is appropriate considering the mannerin which the contributions arise. Nevertheless, in choosing theparameters in combination, it is helpful to consider convenient andtight lower bounds on this rate, via an additive expression of rate dropfrom capacity.

Lemma 36.

Additive representation of rate drop: With a non-negative value for r,represented as in Remark 3 above, the rate R_(tot) is at least (1−Δ)

with Δ given by

$\Delta = {\frac{{snr}\;\delta^{*}}{\left( {1 + \delta_{c}} \right)2{??}} + \frac{r_{crit}^{*}}{\tau^{2}} + \frac{\left( {1 + {r_{1}/\tau^{2}}} \right){D\left( \delta_{c} \right)}}{snr} + {\frac{snr}{2{??}}\left( {\eta + \overset{\_}{f}} \right)} + {{snr}\mspace{14mu}{gap}} + h_{f} + h + {2\delta_{a}} + {\frac{2{??}}{L\; v}.}}$

These are called, respectively, the first and second lines of theexpression for Δ. The first line of Δ is what is also denoted in theintroduction as Δ_(shape) or in the previous section as Δ_(ζ) toemphasize its dependence on ζ which determines the values of r₁, δ_(c),and δ*. In contrast the second line of Δ, which is denote Δ_(second),depends on η, f, and a. It has the ingredients of Δ_(alarm) and thequantities which determine the error exponent.

Demonstration of Lemma 36:

Consider first the ratio

$\frac{1 - \delta_{mis}}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + {r/\tau^{2}}} \right)}.$Splitting according to the two terms of the numerator and using thenon-negativity of r it is at least

$\frac{1}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + {r/\tau^{2}}} \right)} - {\frac{\delta_{mis}}{1 + \delta_{sum}^{2}}.}$From the form of fac, the ratio δ_(mis)/(1+δ_(sum) ²) subtracted here isequal to

${\frac{snr}{2{??}}\frac{\left( {\delta^{*} + \eta + \overset{\_}{f}} \right)}{\left( {1 + \delta_{c}} \right)}},$where in bounding it further drop the (1+δ_(c)) from the terms involvingη+ f, but find it useful to retain the term involving δ*.

Concerning the factors of the first part of the above difference, useδ_(sum) ²≦D(δ_(c))/snr+2

/Lν to bound the factor (1+δ_(sum) ²) by(1+D(δ_(c))/snr)(1+2

/Lν).and use the representation of (1+D(δ_(c))/snr)(1+r/τ²) developed at theend of the previous section,

$\left( {1 + \frac{\left( {1 + {r_{1}/\tau^{2}}} \right){D\left( \delta_{c} \right)}}{snr} + \frac{r_{crit}^{*}}{\tau^{2}}} \right)\left( {1 + {\xi\;{snr}\mspace{14mu}{gap}}} \right)$to obtain that the first part of the above difference is at least

$1 - {\left\lbrack {\frac{r_{crit}^{*}}{\tau^{2}} + \frac{\left( {1 + {r_{1}/\tau^{2}}} \right){D\left( \delta_{c} \right)}}{snr} + {{snr}\mspace{14mu}{gap}} + \frac{2{??}}{L\; v}} \right\rbrack.}$Proceed in this way, including also the factors (1−h′) and 1/(1+δ_(a))to produce the indicated bound on the rate drop from capacity. Thisbound is tight when the individual terms are small, because then theproducts are negligible in comparison. Here it is used that1/(1+δ_(i))≧1−δ_(i) and (1−δ₁)(1−δ₂) exceeds 1−δ₁−δ₂, for non-negativereals δ_(i), where the amount by which it exceeds is the product δ₁δ₂.Likewise inductively products π_(i)(1−δ_(i)) exceed 1−Σ_(i)δ_(i). Thiscompletes the demonstration of Lemma 36.

This additive form of Δ provides some separation of effects thatfacilitates joint optimization of the parameters as in the next Lemma.Nevertheless, once the parameters are chosen, it is preferable toreexpress the rate in the original product form because of the slightlylarger value it provides.

Let's recall parameters that arise in this rate and how they areinterrelated. For the incremental false alarm target use

${f^{*} = {\frac{1}{\sqrt{2\pi}\sqrt{2\log\; B}}{\mathbb{e}}^{{- a}\sqrt{2\log\; B}}}},$such that

$\delta_{a} = {\frac{\log\;{1/\left\lbrack {f^{*}\sqrt{2\pi}\sqrt{2\log\; B}} \right\rbrack}}{2\;\log\; B}.}$With a number of steps m at least 2 and with ρ at least 1, the totalfalse alarms are controlled by f=mf*ρ and the exponent associated withfailed detections is determined by a positive η. Set h_(f) equal to 2snrf plus the negligible ε₃=2sgr√{square root over((1+snr)k/L_(π))}+snr/L_(π), arising in the determination of the weightsof combination of the test statistic. To control the growth of correctdetections setgap=η+ f+1/(m−1).The r₁, r_(crit)*, δ* and δ_(c) are determined as in the precedingsection as functions of the positive parameter ζ.

The exponent of the error probability e^(−L) ^(n) ^(ε) is ε=ε_(η) eithergiven byε_(η)=2η²or, if the Bernstein bound is used, by

$\frac{1}{2}\frac{L}{L_{\pi}}\frac{\eta^{2}}{V + {\left( {1/3} \right)\eta\;{L/L_{\pi}}}}$where V is the minimum value of the variance function discussedpreviously. For the chosen power allocation the L_(π)=1/max_(l)π_((l))has L/L_(π) equal to (2

/ν)(1+δ_(sum) ²), which may be replaced by its lower bound (2

/ν) yielding

$ɛ_{\eta} = {\frac{\eta^{2}}{{V\;{v/{??}}} + {\left( {2/3} \right)\eta}}.}$In both cases the relationship between ε and η is strictly increasing onη>0 and invertible, such that for each ε≧0 there is a uniquecorresponding η(ε)≧0.

Set the Chi-square concentration parameter h so that the exponent(n−m+1)/h_(m) ²/2 matches L_(π)ε_(η), where h_(m) equals(nh−m+1)(n−m+1). Thus h_(m)=√{square root over (2ε_(η)L_(π)/(n−m+1))}which meansh=(m−1)/n+√{square root over (2ε_(η) L _(π)(n−m+1))}/n.With L_(π)≦(ν/2

)L not more than (ν/2)n/log B, it yields h not more than (m−1/n+h* whereh*=√{square root over (νε_(η)/log B)}.The part (m−1)/n which is (m−1)

/L log B is lumped with the above-mentioned remainders 2

/Lν and ε₃, as negligible for large L.

Finally, ρ>1 is chosen such that the false alarm exponent f

(ρ)/ρ matches ε_(η). The function

(ρ)/ρ=log ρ−1+1/ρ is 0 at ρ=1 and is an increasing function of ρ≧1 withunbounded positive range, so it has an inverse function ρ(ε) at whichset ρ=ρ(ε_(η)/ f).

Herein the inventors pin down as many of these values as one can byexploring the best relationship between rate and error probabilityachieved by the analysis of the invented decoder.

Take advantage of the decomposition of Lemma 36.

Lemma 37.

Optimization of the second line of Δ. For any given positive η providingthe exponent ε_(η) of the error probability, the values of theparameters m, f, and ρ, are specified to optimize their effect on thecommunication rate. The second line Δ_(second) of the total rate drop (

−R)/

bound Δ is the sum of three termsΔ_(m)+Δ _(f) +Δ_(η),plus the negligible Δ_(L)=2

/(Lν)+(m−1)

/(L log B)+ε₃. Here

$\Delta_{m} = {\frac{snr}{m - 1} + \frac{\log\; m}{\log\; B}}$is optimized at a number of steps m equal to an integer part of 2+snrlog B at which Δ_(m) is not more than

$\frac{1}{\log\; B} + {\frac{\log\left( {2 + {{snr}\;\log\; B}} \right)}{\log\; B}.}$Likewise Δ _(f) is given by

${{{{snr}\left( {3 + {{1/2}{??}}} \right)}\overset{\_}{f}} - \frac{\log\left( {\overset{\_}{f}\sqrt{2\pi}\sqrt{2\log\; B}} \right)}{\log\; B}},$optimized at the false alarm level f=1/[snr(3+1/2

)log B] at which

$\Delta_{\overset{\_}{f}} = {\frac{1}{\log\; B} + {\frac{\log\left( {{{snr}\left( {3 + {{1/2}\; C}} \right)}{\sqrt{\log\; B}/\sqrt{4\pi}}} \right)}{\log\; B}.}}$The Δ_(η) is given by

$\Delta_{\eta} = {{\eta\;{{snr}\left( {1 + {{1/2}\; C}} \right)}} + \frac{\log\;\rho}{\log\; B} + h^{*}}$evaluated at the optimal ρ=ρ(ε_(η)/ f). It yields Δ_(η) not more thanηsnr(1+1/2

)+ε_(η) snr(3+1/2

)+1/log B+h*.

Together the optimized Δ_(m)+Δ_(f) form what is called Δ_(alarm) in theintroduction. In the next lemma use the Δ_(η) expression, or itsinverse, to relate the error exponent to the rate drop.

Demonstration of Lemma 42:

Recall that

${2\delta_{a}} = {\frac{\log\left\lbrack {\rho\;{m/\left( {\overset{\_}{f}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}} \right\rbrack}{\log\; B}.}$The log of the product is the sum of the logs. Associate the term logm/log B with Δ_(m) and the term log ρ/log B with Δ_(η) and leave therest of 2δ_(a) as part of Δ _(f) . The rest of the terms of Δ associatein the obvious way. Decomposed in this way, the stated optimizations ofΔ_(m) and Δ_(f) are straightforward.

For Δ_(m)=snr/(m−1)+(log m)/(log B) consider it first as a function ofreal values m≧2. Its derivative is −snr(m−1)²+1/(m log B), which isnegative at m₁=1+snr log B, positive at m₂=2+snr log B, and equal to 0at a point m₂*=[m₂+√{square root over (m₂ ²−4)}]/2 in between m₁ and m₂.Moreover, the value of Δ_(m) ₂ is seen to be smaller than the value ofm₁. Accordingly, for m in the interval m₁<m≦m₂, which includes aninteger value, the Δ_(m) remains below what is attained for m≦m₁.Therefore, the minimum among integers occurs at either at the floor└2+snr log B┘ or at the ceiling ┌2+snr log B┐ of m₂, whichever producesthe smaller Δ_(m). [Numerical evaluation confirms that the optimizertends to coincide with the rounding of m₂* to the nearest integer,coinciding with a near quadratic shape of Δ_(m) around m₂*, by Taylorexpansion for m not far from m₂*.]

When the optimal integer in is less than or equal to m₂=2+snr log B, usethat it exceeds M₁ to conclude that Δ_(m)≦1/log B+(log m₂)/(log B). Whenthe optimal m is a rounding up of m₂, use snr/(m−1)≦snr/(1+snr log B).Also log m exceeds log m₂ by the amount log(m/m₂)≦log(1+1/m₂) less than1/(1+snr log B), to obtain that at the optimal integer, Δ_(m) remainsless than

$\frac{1}{\log\; B} + {\frac{\log\; m_{2}}{\log\; B}.}$

For Δ _(f) and Δ_(η) there are two ways to proceed. One is to use theabove expression for δ_(a), and set Δ _(f) as indicated, which is easilyoptimized by setting f at the value specified.

For Δ_(η) note that the log ρ/log B has numerator log ρ equal to1−1/ρ+ε_(η)/ f at the optimized ρ, and accordingly get the claimed upperbound by dropping the subtraction of 1/ρ. This completes thedemonstration of Lemma 42.

It is noted that in accordance with the inverse function ρ(ε_(η)/ f)there is an indirect dependence of the rate drop on f when ε_(η)>0. Onecan jointly optimize Δ _(f) +Δ_(η) for f for given η, though there isnot explicit formula for that solution. The optimization claimed is forΔ _(f) , which produces a clean expression suitable for use with smallpositive η.

A closely related presentation is to write

${2\delta_{a}} = \frac{\log\left\lbrack {m/\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)} \right\rbrack}{\log\; B}$and in other terms involving f, write it as p f*. Optimization of

${{{{snr}\left( {3 + {{1/2}\; C}} \right)}\rho\;{\overset{\_}{f}}^{*}} - \frac{\log\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}{\log\; B}},$occurs at a baseline false alarm level f* that is equal to 1/[ρsnr(3+1/2

)log B]. These approaches have the baseline level of false alarms (aswell as the final value of δ_(a)) depending on the subsequent choice ofρ.

One has a somewhat cleaner separation in the story, as in theintroduction, if f* is set independent of ρ. This is accomplished by adifferent way of spitting the terms of Δ_(second). One writes f=ρ f* asf*+(ρ−1) f*, the baseline value plus the additional amount required forreliability. Then set Δ _(f*) to equal

${{{{snr}\left( {3 + {{1/2}\; C}} \right)}{\overset{\_}{f}}^{*}} - \frac{\log\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}{\log\; B}},$optimized at f*=1/[snr(3+1/2

)log B], which determines a value of δ_(a) for the rate drop envelopeindependent of η. In that approach one replaces Δ_(η) withηsnr(1+1/2

)+(ρ−1)snr(3+1/2

)+h*,with ρ defined to solve f*

(ρ)=ε_(η). There is not an explicit solution to the inverse of

(ρ) at ε_(η)/ f*. Nevertheless, a satisfactory bound for small η isobtained by replacing

(ρ) by its lower bound 2(√{square root over (ρ)}−1)², which can beexplicitly inverted. Perhaps a downside is that from the form of the f*which minimizes Δ _(f*) one ends up, multiplying by ρ, with a final flarger than before.

With 2(√{square root over (ρ)}−1)² replacing

(ρ), it is matched to 2η²/ f* by setting √{square root over(ρ)}−1=η/√{square root over ( f*)} and solving for ρ by adding 1 andsquaring. The resulting expression used in place of Δ_(η) is then aquadratic equation in η, for which its root provides means by which toexpress the relationship between rate drop and error exponent. Then ρ f*is (√{square root over ( f*)}+η)².

A twist here, is that in solving for the best f*, rather than startingfrom η=0, one may incorporate positive η in the optimization of

${{{{snr}\left( {3 + {{1/2}\; C}} \right)}\left( {\sqrt{{\overset{\_}{f}}^{*}} + \eta} \right)^{2}} - \frac{\log\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}{\log\; B}},$for which, taking the derivative with respect to √{square root over (f*)} and setting it to 0, a solution for this optimization is obtainedas the root of a quadratic equation in √{square root over ( f*)}. Uponadding to that the other relevant terms of Δ_(second), namely ηsnr(1+1/2

)+h* one would have an explicit, albeit complicated, expressionremaining in η.

Set Δ_(B)=Δ(snr, B) equal to Δ_(shape)+Δ_(m)+Δ_(f) at the above valuesof ζ, m, f. This Δ_(B) provides the rate drop envelope as a functiononly of snr and B. It corresponding to the large L regime in which onemay take η to be small. Accordingly, Δ_(B) provides the boundary of thebehavior by evaluating Δ with η=0.

The given values of m and f optimize Δ_(B), and the given ζ provides atight bound, approximately optimizing the rate drop envelope Δ_(B). Theassociated total rate R_(tot) evaluated at these choices of parameterswith η=0, denoted

_(m) is at least

(1−Δ_(B)). The associated bound on the fraction of mistakes of the innercode is δ_(mis)*=(snr/2

)(δ*− f).

Express the Δ_(η) bound as a strictly increasing function of the errorexponent ε

${{\eta(ɛ)}{{snr}\left( {1 + {{1/2}\; C}} \right)}} + {ɛ\left( {3 + {{1/2}\; C}} \right)} + \frac{1 - {1/{\rho\left( {ɛ/\overset{\_}{f}} \right)}}}{\log\; B} + \sqrt{v\;{ɛ/\log}\; B}$and let ε(Δ) denote its inverse for Δ≧0, [recognizing also per thestatement of the Lemma above the cleaner upper bound dropping the 1/ρ(ε/f)/log B term]. The part η(ε)snr/2

within the first term is from the contribution to 2δ_(mis) in the outercode rate. From the rate drop of the superposition inner code, the restof Δ_(η) written as a function of ε is denoted Δ_(η,super) and letε_(super)(Δ) denote its inverse function.

For a given total rate R_(tot)<

_(B), an associated error exponent ε isε((

_(B) −R _(tot))/

),which is the evaluation of that inverse at (

_(B)−R_(tot))/

. Alternatively, in place of

_(B) its lower bound

(1−Δ_(B)) may be used and so take the error exponent to be ε(1−R_(tot)/

−Δ_(B)). Either choice provides an error exponent of a code of thatspecified total rate.

To arrange the constituents of this code, use the inner code mistakerate bound δ_(mis)=fac(δ*+ f+η(ε)), and set the inner code rate targetR=R_(tot)/(1−δ_(mis)). Accordingly, for any number of sections L, setthe codelength n, to be L log B/R rounded to an integer, so that theinner code rate L log Bin agrees with the target rate to within a factorof 1±1/n, and the total code rate (1−δ_(mis))R agrees with R_(tot) towithin the same precision.

Theorem 38.

Rate and Reliability of the composite code: As a function of the sectionsize B, let

_(B) and its lower bound

(1−Δ_(B)) be the rate envelopes given above, both near the capacity

for B large. Let a positive R_(tot)<

_(B) be given. If R_(tot)≦

(1−Δ_(B)), set the error exponent ε byε(1−Δ_(B) −R _(tot)/

).Alternatively, to arrange the somewhat larger exponent, with η such thatΔ_(η)=(

_(B)−R_(tot))/

, suppose that Δ_(η)≧δ_(mis); then set ε=ε_(η), that is, ε=ε((

_(B)−R_(tot))/

). To allow any R_(tot)<

_(B) without further condition, there is a unique η>0 such thatΔ_(η,super)

=

_(B)/(1−δ_(mis)*)−R_(tot)/(1−δ_(mis)), at which set ε=ε_(η). In any ofthese three cases, for any number of sections L, the code consisting ofa sparse superposition code and an outer Reed-Solomon code, havingcomposite rate equal to R_(tot), to within the indicated precision, hasprobability of error not more than which is exponentially small in Lnearκe ^(−L) ^(π) ^(ε),which is exponentially small in L_(π), near Lν/(2

), where κ==m(1+snr)^(1/2) B^(c)+2m is a polynomial in B with c=snr

, with number of steps m equal to the integer part of 1+snr log B.

Demonstration of Theorem 38 for Rate Assumption R_(tot)<

(1−Δ_(B)):

Set η>0 such that Δ_(η)=1−Δ_(B)−R_(tot)/

. Then the rate R_(tot) is expressed in the form

(1−Δ_(B)−Δ_(η)). In view of Lemma ?? and the development preceding it,this rate

(1−Δ)=

(1−Δ_(B)−Δ_(η)) is a lower bound on a rate of the established form(1−δ_(mis))

/(1−r/τ²), with parameter values that permit the decoder to beaccumulative up to a point x* with shortfall δ*, providing a fraction ofsection mistakes not more than δ_(mis)=fac(δ*+η+ f), except in an eventof the indicated probability with exponent ε_(h)=ε(Δ_(η)). This fractionof mistakes is corrected by the outer code. The probability of errorbound from the earlier theorem herein is

+2me ^(−L) ^(η) ^(ε).With m≦1+snr log B it is not more than the given κe^(−L) ^(η) ^(ε). Theother part of the Theorem asserts a similar conclusion but with animproved exponent associated with arranging Δ_(η)=(

_(B)−R_(tot))/

, that is, R_(tot)=

_(B)(1−Δ_(η)). The inventors return to demonstrate that conclusion as acorollary of the next result.

One has the option to state the performance scaling results in terms ofproperties of the inner code. At any section size B, recognize thatΔ_(B) above, at the η=0 limit, splits into a contribution fromδ_(mis)*=(snr/2

)( f+δ*/(1+δ_(c))) and the rest which is a bound on the rate drop of theinner superposition code, which is denoted Δ_(super)*, in this small ηlimit. The rate envelope for such superposition codes is

${C_{super}^{*} = \frac{\left( {1 - {2\;{snr}\;\overset{\_}{f}}} \right)C}{\left( {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right)\left\lbrack {\left( {1 + \delta_{a}} \right)^{2} + {r/\left( {2\;\log\; B} \right)}} \right\rbrack}},$evaluated at f, δ_(a), δ_(c), r and ζ as specified above, with η=0, h=0and ρ=1, again with a number of steps m equal to the integer part of1+snr log B. It has

_(super)*≧

(1−Δ_(super)*).Likewise recall that Δ_(η) splits into the part η(ε)snr/2

associated with δ_(mis) and the rest Δ_(η,super) expressed as a functionof ε, for which ε_(super)(Δ) is its inverse.

Theorem 39.

Rate and Reliability of the Sparse Superposition Code: For any rate R<

_(super)*, let ε equalε_(super)(

_(super) *−R)/

).Then for any number of sections L, the rate R sparse superposition codewith adaptive successive decoder, makes a fraction of section mistakesless than δ_(mis)*+η(ε)snr/2

except in an event of probability less than κe^(−L) ^(π) ^(ε).

This conclusion about the sparse superposition code would also hold forvalues of the parameters other than those specified above, producingrelated tradeoffs between rate and the reliable fraction of sectionmistakes. The particular choices of these parameters made above isspecific to the tradeoff that produces the best total rate of thecomposite code.

Demonstration of Theorem 39.

In view of the preceding analysis, what remains to establish is that therate

_(super)*(1−Δ_(η,super))is not more than the rate expression

$\frac{{C\left( {1 - h_{f}} \right)}\left( {1 - h^{*}} \right)}{\left( {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right)\left( {1 + \delta_{a,\rho}} \right)^{2}\left( {1 + {r_{\eta}/\tau^{2}}} \right)}$where Δ_(η,super) which isηsnr(1+r ₁/2 log B)+ε_(η)(3+1/C)(1+1/log B)+√{square root over (νε/logB)}is at leastηsnr(1+r ₁/τ²)+(log ρ)/log B+h*.with ρ and h* satisfying the conditions of the Lemma, so that (once oneaccounts for the negligible remainder in 1/L), the indicated reliabilityholds with this rate. Here it is denoted that δ_(a,ρ)=δ_(a)+(log ρ)/2log B to distinguish the value that occurs with ρ>1 with the value atρ=0 used in the definition of

_(super)*. Likewise r_(η)/τ² is written for the expressionr/τ²+ηsnr(1+r₁/τ²) to distinguish the value that occurs with η>0 withthe value of r/τ² at η=0 used in the definition of

_(super)*. Factoring out terms in common, what is to be verified is that

$\frac{1 - \Delta_{\eta,{super}}}{\left( {1 + \delta_{a}} \right)^{2}\left( {1 + {r/\tau^{2}}} \right)}$is not more than

$\frac{\left( {1 - h^{*}} \right)}{\left. \left( {1 + \delta_{a,\rho}} \right)^{2} \right)\left( {1 + {r_{\eta}/\tau^{2}}} \right\rbrack}.$This is seen to be true by cross multiplying, rearranging, expanding thesquare in (1+δ_(a)+log ρ/2 log B)², using the lower bound onΔ_(η,super), and comparing term by term for the parts involving h* log ρand η. This completes the demonstration of Theorem 39.

Next the rest of Theorem 38 is demonstrated, in view of what has beenestablished. For the general rate condition R_(tot)<C_(B), for η≧0 theexpression

${\Delta_{\eta,{super}}C} + \frac{R_{tot}}{1 - \delta_{mis}^{*} - {{snr}\;{\eta/2}\; C}}$is a strictly increasing function of η in the interval [0, (2

/snr)(1−δ_(mis)*)) where the second term in this expression may beinterpreted as the rate R of an inner code, with total rate R_(tot).This function starts at η=0 at the value R_(tot)/(1−δ_(mis)*) which isless than

_(B)/(1−δ_(mis)*) which is

_(super)*. So there is an η in this interval at which this function hits

_(super). That is Δ_(η,super)

+R=

_(super), or equivalently, Δ_(η,super)=(

_(super)*−R)/

. So Theorem 39 applies with exponent ε_(super)((

_(super)*−R)/

)).

Finally, to obtain the exponent ε((

_(B)−R_(tot))/

)), let Δ_(η)=

_(B)−R_(tot)/

. Examine the rate

_(B)(1−Δ_(η))which is(1−δ_(mis)*)

_(super)*(1−Δ_(η,super) −ηsnr/2

)and determine whether it is not more than the following composite rate(obtained using the established inner code rate),(1−δ_(mis) *−ηsnr/2

)

_(super)*(1−Δ_(ηsuper)).These match to first order. Factoring out

_(super)* and canceling terms shared in common, the question reduces towhether −(1−δ_(mis)*) is not more than −(1−Δ_(η,super)), that is,whether δ_(mis)* is not more than Δ_(η,super), or equivalently, whetherδ_(mis) is not more than Δ_(η), which is the condition assumed in theTheorem for this case. This completes the demonstration of Theorem 38.11 Lower Bounds on Error Exponent:

The second line of the rate drop can be decomposed asΔ_(m)+Δ _(f*)+Δ_(η,ρ),where

$\Delta_{m} = {\frac{snr}{m - 1} + \frac{\log\; m}{\log\; B}}$optimized at a number of steps 117, equal to an integer part of 2+snrlog B. Further,

$\Delta_{{\overset{\_}{f}}^{*}} = {{\vartheta\;{\overset{\_}{f}}^{*}} - \frac{\log\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}{\log\; B}}$where θ=snr(3+1/2

). The optimum value of f* equal to 1/[θ log B] andΔ_(η,ρ)=ηθ₁+(ρ−1)/log B h.

Here θ₁=snr(1+1/2

).

The Δ_(η,ρ) is a strictly increasing function of the error exponent ε,whereΔ_(η,ρ)=θ₁η(ε)+(ρ−1)/log B+h.Let R_(tot)≦

_(B) be given. The objective is to find the error exponent ε*=ε((

_(B)−R_(tot))/

), where ε solves the above equation with Δ_(η,ρ)=(

_(B)−R_(tot))/

. That is,Δ_(η,ρ)=θ₁η(ε)+(ρ−1)/log B+h,where ρ=ρ(ε/ f*).

Now ρ−1=(√{square root over (ρ)}−1)(√{square root over (ρ)}+1), which is(√{square root over (ρ)}−1)²+2(√{square root over (ρ)}−1).Correspondingly, using ε≧2 f*(√{square root over (ρ)}−1)², it followsthat ε/2 f*+√{square root over (2ε/ f*)}≧ρ−1. Further, usingη(ε)=√{square root over (ε/2)} and h*=√{square root over (νε/log B)},one gets that

${\Delta_{\eta,\rho} \leq {{c_{1}ɛ} + {c_{2}\sqrt{ɛ}}}},{where}$c₁ = ϑ/2 and$c_{2} = {\left\lbrack {\frac{\vartheta_{1}}{\sqrt{2}} + \sqrt{2\;{\vartheta/\log}\; B} + \sqrt{\frac{v}{\log\; B}}} \right\rbrack.}$

Solving the above quadratic in √{square root over (ε)} given above, onegets that

${ɛ \geq ɛ_{sol}} = {\left\lbrack \frac{{- c_{2}} + \sqrt{c_{2}^{2} + {4\Delta_{\eta,\rho}c_{1}}}}{2\; c_{1}} \right\rbrack^{2}.}$

It is sought what ε_(sol) looks like for Δ_(η,ρ)near 0. Noticing thatε_(sol) has the shape Δ_(η,ρ) ² for Δ_(η,ρ) near 0, it is desired tofind the limit of ε_(sol)/Δ_(η,ρ) ² as Δ_(η,ρ) goes to zero. Using L′Hospital's rule one get that this limiting value is 1/c₂ ².Correspondingly, using L_(π) is near Lν/2

, one gets that the error exponent is nearexp{−LΔ _(η,ρ) ²/ε₀},for Δ_(η,ρ) near 0, where ξ₀=(2

/ν)c₂ ². This quantity behaves like snr²

for large snr and has the limiting value of (1+4/√{square root over (logB)})²/2 for snr tending to 0.

A simplified expression for ε_(sol) is now given. To simplify this,lower bound the function −a+√{square root over (a²+x)}, with x≧0 with afunction of the form min{α√{square root over (x)},βx}. It is seen that

${{- a} + \sqrt{a^{2} + x}} \geq {\alpha\sqrt{x}\mspace{14mu}{for}\mspace{14mu} x} \geq \frac{4\alpha^{2}a^{2}}{\left( {1 - \alpha^{2}} \right)^{2}}$${{and} - a + \sqrt{a^{2} + x}} \geq {\beta\; x\mspace{14mu}{for}\mspace{14mu} x} \leq {\frac{1 - {2\beta\; a}}{\beta^{2}}.}$Clearly, for the above to have any meaning one requires 0<α<1 and0<β<½a. Further, it is seen that

$\begin{matrix}{{\min\left\{ {{\alpha\sqrt{x}},{\beta\; x}} \right\}} = {{\alpha\sqrt{x}\mspace{14mu}{for}\mspace{14mu} x} \geq \left( {\alpha/\beta} \right)^{2}}} \\{= {{\beta\; x\mspace{14mu}{for}\mspace{14mu} x} \leq {\left( {\alpha/\beta} \right)^{2}.}}}\end{matrix}$Correspondingly, equating (α/β)² with 4α²a²/(1−α²)², or equivalentlyequating (α/β)² with (1−2βa)/β², one gets that 1−α²=2aβ.

Now return to the problem of lower bounding ε_(sol). Set a=c₂ andx=4Δ_(η,ρ)c₁. Also particular choices of β and α are set to simplify theanalysis. Take β=1/4a, for which α=1/√{square root over (2)}. Then theabove gives that

$ɛ_{sol} \geq \frac{\left( {\min\left\{ {{\alpha\sqrt{4\Delta_{\eta,\rho}c_{1}}},{{\beta 4\Delta}_{\eta,\rho}c_{1}}} \right\}} \right)^{2}}{4\; c_{1}^{2}}$which simplifies toε_(sol)≧min{Δ_(η,ρ)/2c ₁,Δ_(η,ρ) ²/4c ₂ ²}.

From Theorem 38, one get that the error probability is bounded byκe ^(−L) ^(η) ^(ε) ^(sol) ,which from the above, can also be bounded by the more simplifiedexpressionκexp{−L _(η)min{Δ_(η,ρ)/2c ₁,Δ_(η,ρ) ²/4c ₂ ²}}.It is desired to express this bound in the form,κexp{−Lmin{Δ_(η,ρ)/ξ₁Δ_(η,ρ) ²)ξ_(2a}})for some ξ₁, ξ₂. Using the fact that L_(π) is near Lν/2

, one gets that ξ₁ is (2

/ν)(2c₁), which givesξ₁=(1+snr)(6

+1).One sees that ξ₁ goes to 1 as snr tends to zero. Further ξ₂=(2

/ν)4c₂ ², which behaves like 4C snr² for large snr. It has the limitingvalue of 2(1+4/√{square root over (log B)})² as snr tends to zero.

Improvement for Rates Near Capacity Using Bernstein Bounds:

The improved error bound associated with correct detection is given by

${\exp\left\{ {- \frac{\eta^{2}}{2\left( {V_{tot} + {\eta/\left( {3\; L_{\pi}} \right)}} \right)}} \right\}},$where V_(tot)=V/L, with V≦{tilde over (c)}_(υ), where {tilde over(c)}_(υ)=(4

/ν²)(a₁+a₂/τ²)/τ. For small η, that is for rates near the rate envelope,the bound behaves like,

$\exp{\left\{ {{- L}\frac{\eta^{2}}{2\; V}} \right\}.}$

Consequently, for such η the exponent is,

$ɛ = {\frac{1}{d_{1}}{\frac{\eta^{2}}{2\;{\overset{\sim}{c}}_{v}}.}}$Here d₁=L_(π)/L. This corresponds to η=√{square root over (d₂)}√{squareroot over (ε)}, where d₂=2d₁{tilde over (c)}_(υ). Here {tilde over(c)}_(υ)=(4

/ν²)(a₁/τ) and that d₁=ν/2

and τ≧√{square root over (2 log B)}, one gets that d₂≦1.62/ν√{squareroot over (log B)}. Substituting this upper bound for η in theexpression for Δ_(η,ρ), it follows that

${\Delta_{\eta,\rho} \leq {{{\overset{\sim}{c}}_{1}ɛ} + {{\overset{\sim}{c}}_{2}\sqrt{ɛ}}}},{{{with}\mspace{14mu}{\overset{\sim}{c}}_{1}} = {\frac{\vartheta}{2}\mspace{14mu}{and}}}$${\overset{\sim}{c}}_{2} = {\left\lbrack {{\sqrt{d_{2}}\vartheta_{1}} + \sqrt{2{\vartheta/\log}\; B} + \sqrt{\frac{v}{\log\; B}}} \right\rbrack.}$

Consequently using the same reasoning as above one gets that using theBernstein bound, for rates close to capacity, the error exponent is likeexp{−LΔ _(72,ρ) ²/{tilde over (ε)}₀},for Δ_(η,ρ) near 0, where {tilde over (ε)}₀=(2

/ν){tilde over (c)}₂ ². This quantity behaves like 2d₂snr²

for large snr. Further, 2d₂ is near 3.24/√{square root over (log B)} forsuch snr. Notice now the error exponent is proportional to L√{squareroot over (log B)}Δ_(η,ρ) ², instead of the LΔ_(η,ρ) ² as before. We seethat for B>36300, the quantity 3.24/√{square root over (log B)} is lessthan one producing a better exponent that before for rates near capacityand for larger snr than before.

12 Optimizing Parameters for Rate and Exponent for No Leveling Using the1−xν Factor

From Corollary 20 one gets that

${GAP} = {\frac{r - r_{up}}{v\left( {\tau^{2} + r} \right)}.}$Simplifying one gets1+r/τ ²=(1+r _(up)/τ²)/(1−νGAP).

Recall that the rate assigned in the analysis herein of sparsesuperposition inner code is

$R = {\frac{\left( {1 - h^{\prime}} \right)C}{\left( {1 + \delta_{a}} \right)^{2}\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + {r/\tau^{2}}} \right)}.}$Here the terms involved in the leveling case are also included, eventhough it is the no leveling case being considering. This will be usefullater on when generalizing to the case with the leveling. Further, withthe Reed-Solomon outer code of rate 1−δ_(mis), which corrects theremaining fraction of mistakes, the total rate of the code is

$R_{tot} = {\frac{\left( {1 - \delta_{mis}} \right)\left( {1 - h^{\prime}} \right)C}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + \delta_{a}} \right)^{2}\left( {1 + {r/\tau^{2}}} \right)}.}$which using the above is equal to

$R_{tot} = {\frac{\left( {1 - \delta_{mis}} \right)\left( {1 - h^{\prime}} \right)\left( {1 - {v\;{GAP}}} \right)C}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + \delta_{a}} \right)^{2}\left( {1 + {r_{up}/\tau^{2}}} \right)}.}$

Lemma 40.

Additive representation of rate drop: With a non-negative value for GAPless than 1/ν the rate R_(tot) is at least (1−Δ)

with Δ given by

$\Delta = {\frac{{snr}\;\delta^{*}}{\left( {1 + \delta_{c}} \right)2\; C} + \frac{r_{up}}{\tau^{2}} + \frac{D\left( \delta_{c} \right)}{snr} + {\frac{snr}{2\; C}\left( {\eta + \overset{\_}{f}} \right)} + {v\;{GAP}} + h^{\prime} + {2\delta_{a}} + {\frac{2\; C}{Lv}.}}$

Demonstration of Lemma 40:

Notice that

$\frac{\left( {1 - \delta_{mis}} \right)\left( {1 - h^{\prime}} \right)\left( {1 - {v{GAP}}} \right)}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + \delta_{a}} \right)^{2}\left( {1 + {r_{up}\text{/}\tau^{2}}} \right)}$is at least

$\frac{\left( {1 - h^{\prime}} \right)\left( {1 - {v{GAP}}} \right)}{\left( {1 + \delta_{sum}^{2}} \right)\left( {1 + \delta_{a}} \right)^{2}\left( {1 + {r_{up}\text{/}\tau^{2}}} \right)}$minus δ_(mis)/(1+δ_(sum) ²). As before, the ratio δ_(mis)/(1+δ_(sum) ²)subtracted here is equal to

$\frac{snr}{2C}{\frac{\left( {\delta^{*} + \eta + \overset{\_}{f}} \right)}{\left( {1 - \delta_{c}} \right)}.}$Further the first part of the difference is at least1−h′−νGAP−δ _(sum) ²−2δ_(a) −r _(up)/τ².Further using δ_(sum) ²≦D(δ_(c))/snr+2

/Lν one gets the result. This completes the demonstration of Lemma 40.

The second line of the rate drop is given by,

${{\frac{snr}{2C}\left( {{\eta\left( x^{*} \right)} + \overset{\_}{f}} \right)} + {v\;{GAP}} + h^{\prime} + {2\delta_{a}} + \frac{2C}{L\; v}},{where}$η(x^(*)) = (1 − x^(*)υ)η^(std) andGAP = η^(std) + log  1/(1 − x^(*))/(m − 1).Thus νGAP is equal toνη^(std) +νc(x*)/(m−1),wherec(x*)=log [1/(1−x*)].

Case 1:

h′=h+h_(f): Now optimize the second line of the rate drop whenh′=h+h_(f). This leads to the following lemma.

Lemma 41.

Optimization of the second line of Δ. For any given positive η providingthe exponent ε_(η) of the error probability, the values of theparameters 111, f* are specified to optimize their effect on thecommunication rate. The second line Δ_(second) of the total rate drop (

−R)/

bound Δ is the sum of three termsΔ_(m)+Δ _(f*+Δ) _(η(x*)),plus the negligible Δ_(L)=2

/(Lν)+(m−1)

/(L log B)+ε₃. Here

$\Delta_{m} = {{v\frac{c\left( x^{*} \right)}{m - 1}} + \frac{\log\; m}{\log\; B}}$is optimized at a number of steps m equal to an integer part of2+νc(x*)log B at which Δ_(m) is not more than

$\frac{1}{\log\; B} + \frac{\log\left( {2 + {v\;{c\left( x^{*} \right)}\log\; B}} \right)}{\log\; B}$Likewise Δ _(f*) is given by

${{\vartheta\;{\overset{\_}{f}}^{*}} - \frac{\log\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}{\log\; B}},$where θ=snr(2+1/2

). The above is optimized at the false alarm level f*=1/[θ log B] atwhich

$\Delta_{{\overset{\_}{f}}^{*}} = {\frac{1}{\log\; B} + {\frac{\log\left( {\vartheta{\sqrt{\log\; B}/\sqrt{4\pi}}} \right)}{\log\; B}.}}$The Δη(x*) is given byΔ_(η(x*))=η^(std)[ν+(1−x*ν)snr/2

]+(ρ−1)/log B+hwhich is bounded by.η^(std)θ₁+(ρ−1)/log B+hwhere θ₁=ν+snr/2

.

Remark:

Since 1−x*=r/(snrτ²) and that r>r_(up), one has that 1−x*≧r_(up)/snrτ².Correspondingly, c(x*) is at most log(snr) log(τ²/r_(up)). The optimumnumber of steps can be bounded accordingly.

Demonstration:

Club all terms involving the number of steps m to get the expression forΔ_(m). It is then seen that optimization of Δ_(m) give the expression asin the proof statement.

Next, write

${2\delta_{a}} = {\frac{\log\left\lbrack {m\text{/}\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\log\; B}} \right)} \right\rbrack}{\log\; B}.}$Further write f as (ρ−1) f*+ f* in terms involving f. For exampleh_(f)=2snr f is written as the sum of 2snr(ρ−1) f* plus 2snr f*. Nowclub all terms involving only f* (that is not (ρ−1) f*) into Δ _(f*).The result is Δ _(f*) to equal

${{\vartheta\;{\overset{\_}{f}}^{*}} - \frac{\log\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}{\log\; B}},$optimized at f*=1/[θ log B], which determines a value of δ_(a) for therate drop envelope independent of η.

The remaining terms are absorbed to give the expression for Δ_(η). Thusget that Δ_(η) is equal toη^(std)[ν+(1−x*ν)/2

]+(ρ−1)/log B+h*.The bound on Δ_(η(x*)) follows from using 1−x*ν≦1.

Error Exponent:

Here it is preferred to use Bernstein bounds for the error boundsassociated with correct detection. Recall that that for rates near therate envelope, that for η(x*) close to 0, the exponent is near

$ɛ = {\frac{1}{d_{1}}{\frac{\left( \eta^{std} \right)^{2}}{2{\overset{\sim}{c}}_{\upsilon}}.}}$As before, this corresponds to η^(std)=√{square root over (d₂)}√{squareroot over (ε)}, where d₂=2d₁{tilde over (c)}_(υ). Substituting thisupper bound for η^(std) in the expression for Δ_(η(x*)), one gets that

${{\Delta_{\eta,\rho} \leq {{{\overset{\sim}{c}}_{1}ɛ} + {{\overset{\sim}{c}}_{2}\sqrt{ɛ}}}},{{{with}\mspace{14mu}{\overset{\sim}{c}}_{1}} = {\vartheta\text{/}2}}}\mspace{14mu}$and${\overset{\sim}{c}}_{2} = {\left\lbrack {{\sqrt{d_{2}}\vartheta_{1}} + \sqrt{2{\vartheta/\log}\; B} + \sqrt{\frac{v}{\log\; B}}} \right\rbrack.}$

Consequently using the same reasoning as above one gets that using theBernstein bound, for rates close to capacity, the error exponent is likeexp{−LΔ _(η,ρ) ²/{tilde over (ξ)}₀},for Δ_(η,ρ) near 0, where {tilde over (ξ)}₀=(2

/ν){tilde over (c)}₂ ². For small snr, this quantity is near (√{squareroot over (d₂)}+√{square root over (2/log B)})². This quantity behaveslike d₂snr²/2

for large snr. For large snr this is the same as what is obtained in theprevious section.

Case 2:

1−h′=(1−h)^(m−1)/(1+h)^(m−1). It is easy to that this implies thath′≦2mh. The corresponding Lemma for optimization of rate drop for suchh′ is presented.

Lemma 42.

Optimization of the second line of Δ. For any given positive η providingthe exponent ε_(η) of the error probability, the values of theparameters m. f* are specified to optimize their effect on thecommunication rate. The second line Δ_(second) of the total rate drop (

−R)/

bound Δ is the sum of three termsΔ_(m)+Δ _(f*)+Δ_(η(x*)),plus the negligible Δ_(L)=2

/(Lν)+(m−1)

/(L log B)+ε₃. Here

$\Delta_{m} = {{v\frac{c\left( x^{*} \right)}{m - 1}} + \frac{\log\; m}{\log\; B}}$is optimized at a number of steps m* equal to an integer part of2+νc(x*)log B at which Δ_(m) is not more than

$\frac{1}{\log\; B} + \frac{\log\left( {2 + {v\;{c\left( x^{*} \right)}\log\; B}} \right)}{\log\; B}$Likewise Δ _(f*) is given by

${{\vartheta\;{\overset{\_}{f}}^{*}} - \frac{\log\left( {{\overset{\_}{f}}^{*}\sqrt{2\pi}\sqrt{2\;\log\; B}} \right)}{\log\; B}},$where θ=snr/2

. The above is optimized at the false alarm level f*=1/[θ log B] atwhich

$\Delta_{{\overset{\_}{f}}^{*}} = {\frac{1}{\log\; B} + {\frac{\log\left( {\vartheta{\sqrt{\log\; B}/\sqrt{4\pi}}} \right)}{\log\; B}.}}$The Δ_(η(x*) is given by)Δ_(η(x*))=η^(std)[ν+(1−x*ν)snr/2

]+(ρ−1)/log B+2m*hwhich is bounded by,η^(std)θ₁+(ρ−1)/log B+2m*hwhere θ₁=ν+snr/2

.

Error Exponent:

Exactly similar to before, use Bernstein bounds for the correctdetection error probabilities to get that,Δ_(η,ρ) ≦{tilde over (c)} ₁ ε+{tilde over (c)} ₂√{square root over (ε)},with {tilde over (c)}₁=θ/2 and

${\overset{\sim}{c}}_{2} = {\left\lbrack {{\sqrt{d_{2}}\vartheta_{1}} + \sqrt{2{\vartheta/\log}\; B} + {2m^{*}\sqrt{\frac{v}{\log\; B}}}} \right\rbrack.}$Notice that since m*=2+νc(x*)log B, one has that

${\overset{\sim}{c}}_{2} = {\left\lbrack {{\sqrt{d_{2}}\vartheta_{1}} + \sqrt{2{\vartheta/\log}\; B} + {4\sqrt{\frac{v}{\log\; B}}} + {2{c\left( x^{*} \right)}v^{3/2}\sqrt{\log\; B}}} \right\rbrack.}$As before the error exponent is likeexp{−LΔ _(η,ρ) ²/{tilde over (ξ)}₀},for Δ_(η,ρ) near 0, where {tilde over (ξ)}=(2

/ν){tilde over (c)}₂ ².

Comparison of Envelope and Exponent for the Two Methods with and withoutFactoring 1−xν Term:

First concentrate attention on the envelope, which is given by

$\frac{1}{\log\; B} + \frac{m^{*}}{\log\; B} + \frac{1}{\log\; B} + {\frac{\log\left( {\vartheta{\sqrt{\log\; B}/\sqrt{4\pi}}} \right)}{\log\; B}.}$For the first method, without factoring out the 1−xν term, θ=snr(3+1/2

) whereas for the second method (Case 2) it is snr/2

. The optimum number of steps m* is 2+snr log B for the first method andis 2+νc(x*) log B. Here c(x*) is log(snr) plus a term of order log logB. Correspondingly, the envelope is smaller for the second method.

Next, observe that the quantity c₂ determines the error exponent. Thesmaller the c₂, the better the exponent. For the first method it is

$\left\lbrack {\frac{\vartheta}{\sqrt{2}} + \sqrt{2{\vartheta/\log}\; B} + \sqrt{\frac{v}{\log\; B}}} \right\rbrack.$where θ₁=snr(1+1/2

). Further θ is as given in the previous paragraph. For the secondmethod it is given by

$\left\lbrack {\frac{\vartheta_{1}}{\sqrt{2}} + \sqrt{2{\vartheta/\log}\; B} + {2m^{*}\sqrt{\frac{v}{\log\; B}}}} \right\rbrack,$where θ₁=ν+snr/2

. It is seen that for larger snr the latter is less producing a betterexponent. To see this, notice that as a function of snr, the first termin c₂, i.e. θ₁/√{square root over (2)}, behaves like snr for the firstcase (without factorization of 1−xν) and is like snr/2

for the second case. The second term in c₂ is like √{square root over(snr)} for the first case and like

for the second. The third is near √{square root over (1/log B)} for theformer case and behaves like log(snr) in the latter case. Consequently,it is the first term in c₂ which determines it behavior for larger snrfor both cases. Since θ₁ is smaller in the second case, it is inferredthat the second method is better for larger snr.

13 Composition with an Outer Code

Use Reed-Solomon (RS) codes (Reed and Solomon, SIAM 1960), as describedfor instance the book of Lin and Costello (2004), to correct anyremaining mistakes from the adaptive successive decoder. The symbols forthe RS code can be associated with that of a Galois field, sayconsisting of q elements and denoted by GF(q). Here q is typically takento be of the form of a power of two, say 2^(m). Let K_(out), n_(out) bethe message and blocklength respectively for the RS code. Further, ifd_(RS) be the minimum distance between the codewords, then an RS codewith symbols in GF(2^(m)) can have the following parameters:n _(out)=2^(m)n _(out) −K _(out) =d _(RS)−1

Here n_(out)−K_(out) gives the number of parity check symbols added tothe message to form the codeword. In what follows it is convenient totake B to be equal to 2^(m) so that one can view each symbol inGF(2^(m)) as giving a number between 1 and B.

Now it is demonstrated how the RS code can be used as an outer code inconjunction with the inner superposition code, to achieve low blockerror probability. For simplicity assume that B is a power of 2. Firstconsider the case when L equals B. Taking m=log₂ B, one sees that sinceL is equal to B, the RS codelength becomes L. Thus, one can view eachsymbol as representing an index specifying the selected term in each ofthe L sections. The number of input symbols is then K_(out)=L−d_(RS)+1,so setting δ=d_(RS)/L one sees that the outer rateR_(out)=K_(out)/n_(out), equals 1−δ+1/L which is at least 1−δ.

For code composition K_(out) log₂ B message bits become the K_(out)input symbols to the outer code. The symbols of the outer codeword,having length L, gives the labels of terms sent from each section usingthe inner superposition with codelength n=L log₂ B/R_(inner). From thereceived Y the estimated labels ĵ₁, ĵ₂, . . . ĵ_(L) using the adaptivesuccessive decoder can be again thought of as output symbols for the RScodes. If {circumflex over (δ)}_(e) denotes the section mistake rate, itfollows from the distance property of the outer code that if2{circumflex over (δ)}_(e)≦δ then these errors can be corrected. Theoverall rate R_(comp) is seen to be equal to the product of ratesR_(out)R_(inner) which is at least (1−δ)R_(inner). Since it is arrangedfor {circumflex over (δ)}_(e) to be smaller than some δ_(mis) withexponentially small probability, it follows from the above thatcomposition with an outer code allows is to communicate with the samereliability, albeit with a slightly smaller rate given by(1−2δ_(mis))R_(inner).

The case when L<B can be dealt with by observing (as in Lin andCostello, page 240) that an (n_(out), K_(out)) RS code as above, can beshortened by length w, where 0≦w<K_(out), to form an (n_(out)−w,K_(out)−w) code with the same minimum distance d_(RS) as before. This isseen by viewing each codeword as being created by appendingn_(out)−K_(out) parity check symbols to the end of the correspondingmessage string. Then the code formed by considering the set of codewordswith the w leading symbols identical to zero has precisely theproperties stated above.

With B equal to 2^(m) as before, the n_(out) is set to equal B, sotaking w to be B−L we get an (n_(out)′, K_(out)′) code, with n_(out)′=L,K_(out)′=L−d_(RS)+1 and minimum distance d_(RS). Now since thecodelength is L and the symbols of this code are in GF(B) the codecomposition can be carried out as before.

14 Appendix

14.1 Distribution of

_(k,j)

Consider the general k>2 case. Focus on the sequence of coefficients

_(1,j),

_(2,j), . . . ,

_(k−1,j) ,V _(k,k,j) ,V _(k+1,k,j) , . . . ,V _(n,k,j)used to represent X_(j) for j in J_(k−1) in the basis

$\frac{G_{1}}{G_{1}},\frac{G_{2}}{G_{2}},\ldots\;,\frac{G_{k - 1}}{G_{k - 1}},\xi_{k,k},\xi_{{k + 1},k},{\ldots\mspace{20mu}\xi_{n,k}},$where the ξ_(i,k) for i from k to n are orthonormal vectors in

^(n), orthogonal to the G₁, G₂, . . . , G_(k−1). These are associatedwith the previously described representation X_(j)=Σ_(k′=1) ^(k−1)

_(k′,j)G_(k′)/∥G_(k′)∥+V_(k,j), except that here V_(k,j) is representedas Σ_(i=k) ^(n)V_(i,k,j)ξ_(i,k).

Let's prove that conditional on

, the distribution of the V_(i,k,j) is independent across i from k to n,and for each such i the joint distribution of (V_(i,k,j): jεJ_(k−1)) isNormal N_(j) _(k−1) (0,Σ_(k−1)). The proof is by induction in k. Alongthe way the conditional distribution properties of G_(k), Z_(k,j), and

_(k,j) are obtained as consequences. As for ŵ_(k) and δ_(k) theinduction steps provide recursions which permit verification of thestated forms.

The V_(i,1,j)=X_(i,j) are independent standard normals.

To analyze the k=2 case, use the vectors U_(1,j)=U_(j) that arise in thefirst step properties in the proof of Lemma 1. There it is seen for unitvectors α, that the U_(jhu T)α for jεJ₁ have a joint N_(J) ₁ (0,Σ₁)distribution, independent of Y. When represented using the orthonormalbasis Y/∥Y∥,ξ_(2,2), . . . , ξ_(n,2), the vector U_(j) has coefficientsZ_(j)=U_(j) ^(T)Y/∥Y∥, and U_(j) ^(T)ξ_(2,2) through U_(j) ^(T)ξ_(n,2).Accordingly N_(j)=b_(1,j)Y/σ+U_(j) has representation in this basis withthe same coefficients, except in the direction Y/∥Y∥ where Z_(j) isreplaced by

_(j)=b_(1,j)∥Y∥/σ+Z_(j). The joint distribution of (V_(i,2,j)=U_(j)^(T)ξ_(i,2): jεJ₁) is Normal N_(J) ₁ (0, Σ₁), independently for i=2 ton, and independent of ∥Y∥ and Z_(j): jεJ₁).

Proceed inductively for k≧2, presuming the stated conditionaldistribution property of the V_(i,k,j) to be true at k, conduct analysisto demonstrate its validity at k+1.

From the representation of V_(k,j) in the basis given above, the G_(k)has representation in the same basis as G_(i,k)=Σ_(jεdec) _(k−1)√{square root over (p_(j))}V_(i,k,j) for i from k to n. The coordinatesless than k are 0, since the V_(k,j) and G_(k) are orthogonal to G₁, . .. , G_(k−1). The value of

_(k,j) is V_(k,j) ^(T)G_(k)/∥G_(k)∥, where the inner product (and norm)may be computed in the above basis from sums of products of coefficientsfor i from k to n.

For the conditional distribution of G_(i,k) given

_(k−1), independence across i, conditional normality and conditionalmean 0 are properties inherited from the corresponding properties of theV_(i,k,j). To obtain the conditional variance of G_(i,k)=Σ_(jεdec)_(k−1) √{square root over (P_(j))}V_(i,k,j), use the conditionalcovariance Σ_(k−1)=I−δ_(k−1)δ_(k−1) ^(T) of V_(i,j,k) for j in J_(k−1).The identity part contributes Σ_(jεdec) _(k−1) P_(j) which is({circumflex over (q)}_(k−1)+{circumflex over (f)}_(k−1))P; whereas, theδ_(k−1)δ_(k−1) ^(T) part, using the presumed form of δ_(k−1),contributes an amount seen to equal ν_(k−1)[Σ_(jεsent∪dec) _(k−1)P_(j)/P]²P which is ν_(k−1){circumflex over (q)}_(k−1) ²P. It followsthat the conditional expected square for the coefficients of G_(k) isσ_(k) ² =[{circumflex over (q)} _(k−1) +{circumflex over (f)} _(k−1)−{circumflex over (q)} _(k−1) ²ν_(k−1) ]P.

Moreover, conditional on

_(k−1), the distribution of ∥G_(k)∥²=Σ_(i=k) ^(n)G_(i,k) ² is that ofσ_(k) ²χ_(n−k+1) ², a multiple of a Chi-square with n−k+1 degrees offreedom.

Next represent V_(k,j)b_(k,j)G_(k)/σ_(k)+U_(k,j) using a value ofb_(k,j) that follows an update rule (depending on

_(k−1)). It is represented usingV_(i,k,j)=b_(k,j)G_(i,k)/σ_(k)+U_(i,k,j) for i from k to n, using thebasis built from the ξ_(j,k).

The coefficient b_(k,j) is the value

[V_(i,k,j)G_(i,k)

_(k−1)]/σ_(k). Consider the product V_(i,k,j)G_(i,k) in the numerator.Use the representation of G_(i,k) as a sum of the √{square root over(P_(j′))}V_(i,k,j′) for j′εdec_(k−1). Accordingly, the numerator isΣ_(j′εdec) _(k−1) √{square root over(P_(j′))}[1_(j′=j)−δ_(k−1,j)δ_(k−1,j′)], which simplifies to √{squareroot over (P_(j))}[1_(jεdec) _(k−1) −ν_(k−1){circumflex over(q)}_(k−1)1_(j sent)]. So for j in J_(k)=J_(k−1)−dec_(k−1), there is thesimplification

${b_{k,j} = {- \frac{{\hat{q}}_{k - 1}v_{k - 1}\beta_{j}}{\sigma_{k}}}},$for which the product for j, j′ in J_(k) takes the form

${b_{k,j}b_{k,j^{\prime}}} = {\delta_{{k - 1},j}\delta_{{k - 1},j^{\prime}}{\frac{{\hat{q}}_{k - 1}v_{k - 1}}{1 + {{\hat{f}}_{k - 1}/{\hat{q}}_{k - 1}} - {{\hat{q}}_{k - 1}v_{k - 1}}}.}}$Here the ratio simplifies to {circumflex over (q)}_(k−1)^(adj)ν_(k−1)/(1−{circumflex over (q)}_(k−1) ^(adj)ν_(k−1)).

Now determine the features of the joint normal distribution of theU_(i,k,j)=V_(i,k,j)−b_(k,j)G_(i,k)/σ_(k) for jεJ_(k), given

_(k−1). These random variables are conditionally uncorrelated and henceconditionally independent given

_(k−1) across choices of i, but there is covariance across choices of jfor fixed i. This conditional covariance

[U_(i,k,j)U_(i,k,j′)

_(k−1)] by the choice of b_(k,j) reduces to

[V_(i,j,k)V_(i,k,j′)

]−b_(k,j)b_(k,j′) which, for jεJ_(k), is1_(j=1′)−δ_(k−1,j)δ_(k−1,j′)−b_(k,j)b_(k,j′). That is, for each i, the(U_(i,k,j): jεJ_(k)) have the joint N_(J) _(k) (0,Σ_(k)) distribution,conditional on

_(k−1), where Σ_(k) again takes the form 1_(j,j′)−δ_(k,j)δ_(k,j′) where

${{\delta_{k,j}\delta_{k,j^{\prime}}} = {\delta_{{k - 1},j}\delta_{{k - 1},j^{\prime}}\left\{ {1 + \frac{{\hat{q}}_{k - 1}^{adj}v_{k - 1}}{1 - {{\hat{q}}_{k - 1}^{adj}v_{k - 1}}}} \right\}}},$for j,j′ now restricted to J_(k). The quantity in braces simplifies to1/(1−{circumflex over (q)}_(k−1) ^(adj)νk⁻¹). Correspondingly, therecursive update rule for ν_(k) is

$v_{k} = {\frac{v_{k - 1}}{1 - {{\hat{q}}_{k - 1}^{adj}v_{k - 1}}}.}$

Consequently, the joint distribution for (Z_(k,j): jεJ_(k)) isdetermined, conditional on

_(k−1). It is also the normal N(0,Σ_(k)) distribution and (Z_(k,j):jεJ_(k)) is conditionally independent of the coefficients of G_(k),given

_(k−1). After all, the Z_(k,j)=U_(k,j) ^(T)G_(k)/∥G_(k)∥ have this N_(J)_(k) (0,Σ_(k)) distribution, conditional on G_(k) and

_(k−1), but since this distribution does not depend on G_(k) it yieldsthe stated conditional independence.

Now

_(k,j)=X_(j) ^(T)G_(k)/∥G_(k)∥ reduces to V_(k,j) ^(T)G_(k)/∥G_(k)∥ bythe orthogonality of the G₁ through G_(k−1) components of N_(j) withG_(k). So using the representation V_(k,j)=b_(k,j)G_(k)/σ_(k)+U_(k,j)one obtains

_(k,j) =b _(k,j) ∥G _(k)∥/σ_(k) +Z _(k,j).This makes the conditional distribution of the

_(k,j), given

_(k−1), close to but not exactly normally distributed, rather it is alocation mixture of normals with distribution of the shift of locationdetermined by the Chi-square distribution of χ_(n−k+1) ²=∥G_(k)∥²/σ_(k)². Using the form of b_(k,j), for j in J_(k), the location shiftb_(k,j)χ_(n−k+1) may be written−√{square root over (ŵ_(k) C _(j,R,B))}[χ_(n−k+1)/√{square root over(n)}]1_(j sent),where ŵ_(k) equals nb_(k,j) ²/C_(j,R,B). The numerator and denominatorhas dependence on j through P_(j), so canceling the P_(j) produces avalue for ŵ_(k). Indeed, C_(j,R,B)=(P_(j)/P)ν(L/R)log B equalsn(P_(j)/P)ν and b_(k,j) ²=P_(j){circumflex over (q)}_(k−1) ^(adj)ν_(k−1)²/[1−{circumflex over (q)}_(k−1) ^(adj)ν_(k−1)]. So this ŵ_(k) may beexpressed as

${{\hat{w}}_{k} = {\frac{v_{k - 1}}{v}\frac{{\hat{q}}_{k - 1}^{adj}v_{k - 1}}{1 - {{\hat{q}}_{k - 1}^{adj}v_{k - 1}}}}},$which, using the update rule for ν_(k−1), is seen to equal

${\hat{w}}_{k} = {\frac{v_{k - 1} - v_{k}}{v}.}$

Armed with G_(k), update the orthonormal basis of

^(n) used to represent X_(j), V_(k,j) and U_(k,j). From the previousstep this basis was G₁/∥G₁∥, . . . , G_(k−1)/∥G_(k−1)∥ along withξ_(k,k), ξ_(k+1,k), . . . , ξ_(n,k), where only the later are needed forthe V_(k,j) and U_(k,j) as their coefficients in the directions G₁,G_(k−1) are 0.

Now Gram-Schmidt makes an updated orthonormal basis of

^(n), retaining the G₁/∥G₁∥, . . . , G_(k−1)/ but replacing ξ_(k,k),ξ_(k+1,k), . . . , ξ_(n,k) with G_(k)/∥G_(k)∥, ξ_(k+1,k+1), . . . ,ξ_(n,k+1). By the Gram-Schmidt construction process, these vectorsξ_(i,k+1) for i from k+1 to n are determined from the original basisvectors (columns of the identity) along with the computed random vectorsG₁, . . . , G_(k) and do not depend on any other random variables inthis development.

The coefficients of U_(k,j) in this updated basis are U_(k,j)^(T)G_(k)/∥G_(k)∥, U_(k,j) ^(T)ξ_(k+1,k+1), . . . , U_(k,j)^(T)ξ_(n,k+1), which are denoted U_(k,k+1,j)=Z_(k,j) and U_(k+1,k+1,j),. . . , U_(k+1,n,j), respectively. Recalling the normal conditionaldistribution of the U_(k,j), these coefficients (U_(i,k+1,j): k≦i≦n,jεJ_(k)) are also normally distributed, conditional on

_(k−1) and G_(k), independent across i from k to n (this independencebeing a consequence of their uncorrelatedness, due to the orthogonalityof the ξ_(i,k+1) and the independence of the coefficients U_(i,k,j)across i in the original basis); moreover, as seen already for i=k, foreach i from k to n, the (U_(i,k+1,j): jεJ_(k)) inherit a joint normalN(0,Σ_(k)) conditional distribution from the conditional distributionthat the (U_(i,k,j): jεJ_(k)) have. After all, these coefficients havethis conditional distribution, conditioning on the basis vectors and

_(k−1), and this conditional distribution is the same for all such basisvectors. So, in fact, these (U_(i,k+1,j): k≦i≦n, jεJ_(k)) areconditionally independent of the G_(k) given

_(k−1).

Specializing the conditional distribution conclusion, by separating offthe i=k case where the coefficients are Z_(k,j), one has that the(U_(i,k+1,j): k+1≦i≦n, jεJ_(k)) have the specified conditionaldistribution and are conditionally independent of G_(k) and (Z_(k,j):jεJ_(k)) given

_(k−1). It follows that the conditional distribution of (U_(i,k+1,j):k+1≦i≦n, jεJ_(k)) given

_(k)=(

_(k−1),∥G_(k)∥,Z_(k)) is identified. It is normal N(0, Σ_(k)) for eachi, independently across i from k+1 to n, conditionally given

_(k).

Likewise, the vector V_(k,j)=b_(k,j)G_(k)/σ_(k)+U_(k,j) hasrepresentation in this updated basis with coefficient

_(k,j) in place of Z_(k,j) and with V_(i,k+1,j)=U_(i,k+1,j) for i fromk+1 to n. So these coefficients (V_(i,k+1,j): k+1≦i≦n, jεJ_(k)) have thenormal N(0, Σ_(k)) distribution for each i, independently across i fromk+1 to n, conditionally given

_(k).

Thus the induction is established, verifying this conditionaldistribution property holds for all k=1, 2, . . . , n. Consequently, theZ_(k) and ∥G_(k)∥ have the claimed conditional distributions.

Finally, repeatedly apply ν_(k′)/ν_(k′−1)=1/(1−{circumflex over(q)}_(k′) ^(adj)ν_(k′−1)), for k′ from k to 2, each time substitutingthe required expression on the right and simplifying to obtain

$\frac{v_{k}}{v_{k - 1}} = {\frac{1 - {\left( {{\hat{q}}_{1}^{adj} + \ldots + {\hat{q}}_{k - 2}^{adj}} \right)v}}{1 - {\left( {{\hat{q}}_{1}^{adj} + \ldots + {\hat{q}}_{k - 2}^{adj} + {\hat{q}}_{k - 1}^{adj}} \right)v}}.}$This yields ν_(k)=νŝ_(k), which, when plugged into the expressions forŵ_(k), establishes the claims. The proof of Lemma 2 is complete.14.2 The Method of Nearby Measures

Recall that the Renyi relative entropy of order α>1 (also known as the αdivergence) of two probability measures

and

with density functions p(Z) and q(Z) for a random vector Z is given by

$\left. {{D_{\alpha}\left( {\mathbb{P}} \right.}Q} \right) = {\frac{1}{\alpha - 1}\log\mspace{14mu}{{{??}_{Q}\left\lbrack \left( {{p(Z)}\text{/}{q(Z)}} \right)^{\alpha} \right\rbrack}.}}$Its limit for large α is D_(∞()

_(∥)

_()=log∥p/q∥) _(∞).

Lemma 43.

Let

and

be a pair of probability measures with finite D_(α)(

∥

). For any event A, and α>1,

[A]≦[

.If D_(α)(

∥

)≦c₀ for all α, then the following bound holds, taking the limit oflarge α,

[A]≦

[A]e^(c) ⁰ .In this case the density ratio p(Z)/q(Z) is uniformly bounded by e⁰ ⁰ .

Demonstration of Lemma 43:

For convex f, as in Csiszar's f-divergence inequality, from Jensen'sinequality applied to the decomposition of

[f(p(Z)/q(Z))] using the distributions conditional on A and itscomplement,

Af(

A/

A)+

A^(c) f(

A^(c)/

A^(c))≦

f(p(Z)/q(Z)).Using in particular f(r)=r^(α) and throwing out the non-negative A^(c)part, yields(

A)^(α)≦(

A)^(α−1)

[(p(Z)/q(Z))^(α)].It is also seen as Holder's inequality applied to fq(p/q)1_(A). Takingthe α root produces the stated inequality.

Lemma 44.

Let

_(Z) be the joint normal N(0,Σ) distribution, with Σ=I−bb^(T) where∥b∥²=ν<1. Likewise, let

_(Z) be the distribution that makes the Z_(j) independent standardnormal. Then the Rènyi divergence is bounded. Indeed, for all 1≦α≦∞.D _(α)(

_(Z)∥

_(Z))≦c ₀.where c₀=−(½)log [1−ν]. With ν=P(σ²−P), this constant is c₀=(½)log[1−P/σ²].

Demonstration of Lemma 44:

Direct evaluation of the a divergence between N(0,Σ) and N(0,I) revealsthe value

$D_{\alpha} = {{{- \frac{1}{2}}\log{\Sigma }} - {\frac{1}{2\left( {\alpha - 1} \right)}\log{{{\alpha\; I} - {\left( {\alpha - 1} \right)\Sigma}}}}}$Expressing Σ=I−Δ, it simplifies to

${\frac{1}{2}\log{{I - \Delta}}} - {\frac{1}{2\left( {\alpha - 1} \right)}\log{{I + {\left( {\alpha - 1} \right)\Delta}}}}$

The matrix Δ is equal to bb^(T), with b as previously specified with∥b∥²=ν. The two matrices I−Δ and I+(α−1)Δ each take the form I+γbb^(T),with γ equal to −1 and (α−1) respectively.

The form I+γbb^(T) is readily seen to have one eigenvalue of 1+γνcorresponding to an eigenvector b/∥b∥ and L−1 eigenvalues equal to 1corresponding to eigenvectors orthogonal to the vector b. The logdeterminant is the sum of the logs of the eigenvalues, and so, in thepresent context, the log determinants arise exclusively from the oneeigenvalue not equal to 1. This provides evaluation of D_(α) to be

${{{- \frac{1}{2}}{\log\left\lbrack {1 - v} \right\rbrack}} - {\frac{1}{2\left( {\alpha - 1} \right)}\log\left. {1 + {\left( {\alpha - 1} \right)v}} \right\rbrack}},$where an upper bound is obtained by tossing the second term which isnegative.

One sees that max_(Z)p(Z)/q(Z) is finite and equals [1/(1−ν)]^(1/2).Indeed, from the densities N(0, I−bb^(T)) and N(0, I) this claim can beestablished, noting after orthogonal transformation that these measuresare only different in one variable, which is either N(0, 1−ν) or N(0,1), for which the maximum ratio of the densities occurs at the originand is simply the ratio of the normalizing constants. This completes thedemonstration of Lemma 44.

With ν=P/(σ²+P) this limit −(½)log [1−ν] which is denoted as c₀ is thesame as (½) log [1+P/σ²]. That it is the same as the capacity

appears to be coincidental.

Demonstration of Lemma 3:

The task is to show that for events A determined by

_(k) the probability

[A] is not more than

[A]e^(kc) ⁰ . Write the probability as an iterated expectationconditioning on

_(k−1). That is,

[A]=

A

_(k−1)]]. To determine membership in A, conditional on

_(k−1), one only needs Z_(k,J) _(k) =(Z_(k,j): jεJ_(k)) where J_(k) isdetermined

_(k−1). Thus

[A]=

[A]],where the subscript on the outer expectation is used to denote that itis with respect to

and the subscripts on the inner conditional probability to indicate therelevant variables. For this inner probability switch to the nearbymeasure

ℚχ_(n − k + 1), Z_(k, J_(k))|ℱ_(k − 1).These conditional measures agree concerning the distribution of theindependent χ_(n−k+1) ², so the a relative entropy between them arisesonly from the normal distributions of the Z_(k,j) _(k) given

_(k−1). This α relative entropy is bounded by c₀.

To see this, recall that from Lemma 2 that

_(z) _(k,Jk |)

_(k−1) is N_(J) _(k) (0,Σ_(k)) with Σ_(k)=I−δ_(k)δ_(k) ^(T). Now

${\delta_{k}{^{2}{= {v_{k}{\sum\limits_{j \in {{sent}\bigcap\; J_{k}}}{P_{j}/P}}}}}}$which is (1({circumflex over (q)}₁+ . . . +{circumflex over(q)}_(k−1)))ν_(k). Noting that ν_(k)={umlaut over (s)}_(k)ν andŝ_(k)(1−({circumflex over (q)}₁+ . . . +{circumflex over (q)}_(k−1))) isat most 1, get that ∥δ_(k)∥²≦ν. Thus from Lemma 44, for all α≧1, the αrelative entropy between

_(Z) _(k,Jk) _(|)

_(k−1) and the corresponding

conditional distribution is at most c₀.

So with the switch of conditional distribution, a bound is determinedwith a multiplicative factor of e^(c) ⁰ . The bound on the innerexpectation is then a function of

_(k−1), so the conclusion follows by induction. This completes thedemonstration of Lemma 3.

14.3 Demonstration of Lemmas on the Progress of q_(1,k)

Demonstration of Lemma 6:

Consider any step k with q_(1,k−1)−f_(1,k−1)≦x*. Now x=q_(1,k−1) ^(adj)is at least {tilde over (x)}=q_(1,k−1)−f_(1,k−1), where these areinitialized to be 0 when k=1. Consider q_(1,k)=g_(L)(x)−η_(k) which isat least g_(L)({tilde over (x)})−η_(k), since the function g_(L) isincreasing. By the gap property, it is at least {tilde over(x)}+gap({tilde over (x)})−η_(k), which in turn is at least q_(1,k−1)−f(x)+gap(x)−η(x), which is at least q_(1,k−1)+gap′.

The increase q_(1,k)−q_(1,k−1) is at least gap′ each such step, so thenumber of such steps m−1 is not more than 1/gap′. At the final step{tilde over (x)}=q_(1,m−1)−f_(1,m−1) exceeds x* so q_(1,m) is at leastg_(L)(x*)−η_(m) which is 1−δ*−η_(m). This completes the demonstration ofLemma 6.

Demonstration of Lemma 5:

With a constant gap bound, the claim when f_(1,k)≦ f follows from theabove, specializing f and η to be constant. As for the claim whenf_(1,f)=kf, it is actually covered by the case that f_(1,k)≦ f, in viewof the choice that f≦ f/m*. This completes the demonstration of Lemma 5.

14.4 The Gap has not More than One Oscillation

Demonstration of Lemma 21:

In the same manner as the derivative result for g_(num)(x), theg_(low)(x) has derivative with respect to x given by the followingfunction, evaluated at z=z_(x),

$\left\{ {{\frac{\tau\;\Delta_{c}}{2}\left( {1 + \frac{z}{\tau}} \right)^{3}{\phi(z)}} + {\int_{z}^{\infty}{\left( {1 + {t/\tau}} \right)^{2}{\phi(t)}{\mathbb{d}t}}}} \right\}{\frac{R}{C^{\prime}}.}$Subtracting 1+D(δ_(c))/snr from it gives the function der(z), which atz=z_(x) is the derivative with respect to x ofG(z_(x))=g_(low)(x)−x−xD(δ_(c))/snr. The mapping from x to z_(x) isstrictly increasing, so the sign of der(z) provides the direction ofmovement of either G(z) or of G(z_(x)).

Consider the behavior of der (z) for z≧−τ which includes [z₀, z₁]. Atz=−τ the first term vanishes and the integral is not more than 1+1/τ²,so under the stated condition on R, the der(z) starts out negative atz=−τ. Likewise note that der(z) is ultimately negative for large z sinceit approaches −(1+D(δ_(c))/snr. Let's see whether der(z) goes upanywhere to the right of −τ. Taking its derivative with respect to z,one obtains

$\begin{matrix}{{{der}^{\prime}(z)} = {\left\{ {{{- \frac{\tau\;\Delta_{c}}{2}}\left( {1 + {z/\tau}} \right)^{3}z\;{\phi(z)}} + {\frac{3\Delta_{c}}{2}\left( {1 + {z/\tau}} \right)^{2}{\phi(z)}} - {\left( {1 + {z/\tau}} \right)^{2}{\phi(z)}}} \right\}{\frac{R}{C^{\prime}}.}}} & \;\end{matrix}$The interpretation of der′(z) is that since der(z_(x)) is the firstderivative of G(z_(x)), it follows that z_(x)′der′(z_(x)) is the secondderivative, where z_(x)′ as determined in the proof of Corollary 13 isstrictly positive for z>−τ. Thus the sign of the second derivative ofthe lower bound on the gap is determined by the sign of der′(z).

Factoring out the positive (1+z/τ)²φ(z)R/

for z>−τ, the sign of der′(z) is determined by the quadratic expression−(τΔ_(c)/2)(1−z/τ)z+3Δ_(c)/2−1,which has value 3Δ_(c)/2−1 at z=−τ and at z=0. The discriminant ofwhether there are any roots to this quadratic yielding der′(z)=0 isgiven by (τΔ_(c))²/4−2Δ_(c)(1−3Δ_(c)/2). Its positivity is determined bywhether τ²Δ_(c)/4>2−3Δ_(c), that is, whether Δ_(c)>2/(τ²/4+3). IfΔ_(c)≦2/(τ²/4+3) which is less than 2/3, then der′(z), which in thatcase starts out negative at z=−τ, never hits 0, so it stays negative forz≧−τ, so der(z) never goes up to the right of −τ and G(z) remains adecreasing function. In that decreasing case one may takez_(G)=z_(max)=−τ.

If Δ_(c)>2/(τ²/4+3), then by the quadratic formula there is an intervalof values of z between the pair of points −τ/2±√{square root over(τ²/4−(2/Δ_(c))(1−3Δ/2))}{square root over (τ²/4−(2/Δ_(c))(1−3Δ/2))}within which der′(z) is positive, and within the associated interval ofvalues of x the G(z_(x)) is convex in x. Outside of that interval thereis concavity of G(z_(x)). So then either der(z) remains negative, sothat G(z) is decreasing for z≧−τ, or there is a root z_(crit)>−τ whereder(z) first hits 0 and der′(z)>0, i.e. that root, if there is one, isin this interval. Suppose there is such a root. Then from the behaviorof der′(z) as a positive multiple of a quadratic with two zerocrossings, the function G(z) experiences an oscillation.

Indeed, until that point z_(crit), the der(z) is negative so G(z) isdecreasing. After that root, the der(z) is increasing between z_(crit)and z_(right), the right end of the above interval, so der(z) ispositive and G(z) is increasing between those points as well. Nowconsider z≧z_(right), where der′(z)≦0, strictly so for z>z_(right). Atz_(right) the der(z) is strictly positive (in fact maximal) andultimately for large z the der(z) is negative, so for z>z_(right) theG(z) rises further until a point z=z_(max) where der(z)=0. To the rightof that point since der′(z)<0, the der(z) stays negative and G(z) isdecreasing. Thus der(z) is identified as having two roots z_(crit) andz_(max), and G(z) is unimodal to the right of z_(crit).

To determine the value of der(z) at z=0, evaluate the integral ∫_(z)^(∞)(1−t/ν)²π(t)dt. In the same manner as in the preceding subsection,it is (1+1/τ²) Φ(z)+(2τ+z)φ(z)/τ².

Thus der(z) is

${\frac{R}{C^{\prime}}\left\{ {{\frac{\tau\;\Delta_{c}}{2}\left( {1 + \frac{z}{\tau}} \right)^{3}{\phi(z)}} + {\frac{{2\tau} + z}{\tau^{2}}{\phi(z)}} + {\left( {1 + \frac{1}{\tau^{2}}} \right){\overset{\_}{\Phi}(z)}}} \right\}} - {\left( {1 + \frac{D\left( \delta_{c} \right)}{snr}} \right).}$At z=0 it is

${\frac{R}{C^{\prime}}\left\{ {{\left( {\frac{\tau\;\Delta_{c}}{2} + \frac{2}{\tau}} \right)\frac{1}{\sqrt{2\pi}}} + {\left( {1 + \frac{1}{\tau^{2}}} \right)/2}} \right\}} - {\left\lbrack {1 + {{D\left( \delta_{c} \right)}/{snr}}} \right\rbrack.}$It is non-negative if τΔ_(c)/(2√{square root over (2π)}) exceeds[1+D(δ_(c))/snr]

/R−(1+1/τ²)/2−2/(τ√{square root over (2π)})which using

/R=1+r/τ² is

$\frac{1}{2} + \frac{\left( {r - {1/2}} \right)}{\tau^{2}} + {\frac{D\left( \delta_{c} \right)}{snr}\left( {1 + {r/\tau^{2}}} \right)} - {2/{\left( {\tau\sqrt{2\pi}} \right).}}$It is this expression which is called half for it tends to be not muchmore than ½. For instance, if D(δ_(c))/snr≦1/2 and (3/2)r≦(2/√{squareroot over (2π)})τ, then this expression is not more than 1−1/(2τ²) whichis less than 1.

So then der(z) is non-negative at z=0 if

$\Delta_{c} \geq {\frac{2\sqrt{2\pi}\mspace{11mu}{half}}{\tau}.}$Non-negativity of der(0) implies that the critical value of the functionG satisfies z_(G)≦0.

Suppose on the other hand that der(0)<0. Then Δ_(c)<2√{square root over(2π)}half/τ, which is less than 2/3 when τ is at least 3√{square rootover (2π)}half. Using the condition Δ_(c)≦2/3, the der′(z)<0 for z>0. Itfollows that G(z) is decreasing for z>0, and both z_(G) and z_(max) arenon-positive.

Next consider the behavior of the function A(z), for which it is hereshown that it too has at most one oscillation. Differentiating andcollecting terms obtain that A′(z) isA′(z)=2(1−Δ_(c))(z+τ)Φ(z)+Δ_(c)(z+τ)²φ(z).

Consider values of z in I_(τ)=(−τ,∞) to the right of −τ. Factoring out2(z+τ), the sign behavior of A′(z) is determined by the functionM(z)=−(1−Δ_(c))Φ(z)+(Δ_(c)/2)(z+τ)φ(z).This function M(z) is negative for large z as it converges to−2(1−Δ_(c)). Thus A(z) is decreasing for large z. At z=−τ the sign ofM(z) is determined by whether Δ_(c)<1, if so then M(z) starts outnegative, so then A(z) is initially decreasing, whereas in the unusualcase of Δ_(c)≧1, the A(z) is initially increasing and so set z_(A)=−τ.Consider the derivative of M(z) given byM′(z)=−[1−3Δ_(c)/2+(Δ_(c)/2)z(z+τ)]φ(z).The expression in brackets is the same quadratic function of zconsidered above. It is centered and extremal at z_(cent)=−τ/2. Thisquadratic attains the value 0 only if Δ_(c) is at leastΔ_(c)*=2/(τ²/4+3).

For Δ_(c)<Δ_(c)*, which is less than 1, the M′(z) stays negative andconsequently M(z) is decreasing, so M(z) and A′(z) remains negative forz>−τ. Then A(z) is decreasing in I_(τ) (which actually implies themonotonicity of G(z) under the same condition on Δ_(c)).

For Δ_(c)≧Δ_(c)*, for which the function M′(z) does cross 0, this M′(z)is positive in the interval of values of z centered at z_(cent)=−τ/2 andheading up to the point z_(right) previously discussed. In this intervalincluding [−τ/2, z_(right)] the function M(z) is increasing.

Let's see whether M(z) is positive, at or to the left of z_(cent). ForΔ_(c)>1 that positivity already occurred at and just to the right of −τ.For Δ_(c)≦1, use the inequality Φ(z)≦φ(z)/(−z) for z<0. This lower boundis sufficient to demonstrate positivity in an interval of values of zcentered at the same point z_(cent)=−τ/2, provided Δ_(c)τ²/4 is at least2(1−Δ_(c)), that is, Δ_(c) at least Δ_(c)**=2/(τ²/4+2). Then z_(A) isnot more than the left end of this interval, which is less than −τ/2.For Δ_(c)≧Δ_(c)**, this interval is where the same quadratic z(z+τ) isless than −2(1−Δ_(c))/Δ_(c). Then the M(z) is positive at −τ/2 andfurthermore increasing from there up to z_(right), while, further to theright it is decreasing and ultimately negative. It follows that suchM(z) has only one root to the right of −τ/2. The A′(z) inherits the samesign and root characteristics as M(z), so A(z) is unimodal to the rightof −τ/2.

If Δ_(c) is between Δ_(c)* and Δ_(c)**, the lower bound invoked isinsufficient to determine the precise conditions of positivity of M(z)at z_(cent), so resort in this case to the milder conclusion, from thenegativity of M′(z) to the right of z_(right), that M(z) is decreasingthere and hence it and A′(z) has at most one root to the right of thatpoint, so A(z) is unimodal there. Being less than Δ_(c)**, the value ofΔ_(c) is small enough that 2/Δ_(c)>τ²/4+2, and hence z_(right) is notmore than [−τ+√{square root over (4)}]/2 which is −τ/2+1.

This completes the demonstration of Lemma 21.

It is remarked concerning G(z) that one can pin down the location ofz_(G) further. Under conditions on Δ_(c), it is near to and not morethat a value near

${- \sqrt{2\;{\log\left( {\frac{1}{2\pi}\frac{{\tau\Delta}_{c}/2}{{{D\left( \delta_{c} \right)}/{snr}} + {\left( {r - 1} \right)/\tau^{2}}}} \right)}}},$provided the argument of the logarithm is of a sufficient size. As said,precise knowledge of the value of z_(G) is not essential because theshape properties allow one to take advantage of the tight lower boundson A(z) for negative z.14.5 The Gap in the Constant Power Case

Demonstration of Corollary 19.

We are to show under the stated conditions that g(x)−x is smallest in[0,x*] at x=x*, when the power allocation is constant. For x in [0,1]the function z_(x) is one to one. In this u_(cut)=1 case, it is equal toz_(x)=[√{square root over ((1+r/τ²)/(1−xν))}{square root over((1+r/τ²)/(1−xν))}−1]τ. It starts at x=0 with z₀ and at x=x* it is ζ.Note that (1+z_(o)/τ)²=1+r/τ². If r≧0 the z₀≧0, while, in any case, forr>−τ² the z₀ at least exceeds −τ. Invert the formula for z=z_(x) toexpress x in terms of z. Using g(x)=Φ(z) and subtracting the expressionfor x, we want the minimum of the function

${G(z)} = {{\Phi(z)} - {\frac{1}{v}{\left( {1 - \frac{\left( {1 + {r/\tau^{2}}} \right)}{\left( {1 + {z/\tau}} \right)^{2}}} \right).}}}$Its value at z₀ is G(z₀)=Φ(z₀). Consider the minimization of G(z) forz₀≦z≦ζ, but take advantage, when it is helpful, of properties for allz>−τ. The first derivative is

${{\phi(z)} - {\frac{2}{v\;\tau}\frac{\left( {1 + {r/\tau^{2}}} \right)}{\left( {1 + {z/\tau}} \right)^{3}}}},$ultimately negative for very large z. This function has 0, 1, or 2 rootsto the right of −τ. Indeed, to be zero it means that z solvesz ²−6 log(1+z/τ)=2 log(ντ/c)where c=2(1+r/τ²)√{square root over (2π)}. The function on the left sideυ(z)=z²−6 log(1+z/τ) is convex, with a value of 0 and a negative slopeat z=0 and it grows without bound for large z. This function reaches itsminimum value (lets call it υal<0) at a point z=z_(crit)>0, which solves2z−6/(τ+z)=0, given by z_(crit)=(τ/2)[√{square root over (1+12/τ²)}−1]not more than 3/τ.

When υal>2 log(ντ/c) there are no roots, so G(z) is decreasing for z>−τand has its minimum on [0, ζ] at z=ζ.

When 2 log(ντ/c) is positive (that is, when ντ>c, which is the conditionstated in the corollary), it exceeds the value of the expression on theleft at z=0, and G is increasing there. So from the indicated shape ofthe function υ(z), there is one root to the right of 0, which must be amaximizer of G(z), since G(z) is eventually decreasing. So then G(z) isunimodal for positive z and so if z₀≧0 its minimum in [z₀,ζ] is ateither z=z₀ or z=ζ and this minimum is at least min{G(0), G(ζ)}. Thevalue at z=0 is G(0)=½. So, with (r−r_(up))/[snrτ²+r₁)] less than ½, theminimum for z≧0 occurs at z=ζ, which demonstrates the first conclusionof Corollary 19.

If r is negative then z₀<0, and consider the shape of G(z) for negativez. Again with the assumption that 2 log(ντ/c) is positive, the functionG(z) for z≧−τ is seen to have a minimizer at a negative z=z_(min)solving z²=2 log(ντ/c)−6 log(1+z/τ), where G′(z)=0, and G(z) isincreasing between z_(min) and 0. It is inquired as to whether G(z) isincreasing at z₀. If it is, then z₀≧z_(min) and G(z) is unimodal to theright of z₀. The value of the derivative there is

${{\phi\left( z_{0} \right)} - \frac{2}{v\left( {\tau + z_{0}} \right)}},$which is positive if

${z_{0}} \leq {\sqrt{2{\log\left( {{{v\left( {\tau + z_{0}} \right)}/2}\sqrt{2\pi}} \right)}}.}$As shall be seen momentarily, z₀ is between r/τ and r/2τ, so thispositive derivative condition is implied by

${r/\tau} \geq {- {\sqrt{2\mspace{11mu}{\log\left( {{{v\left( {\tau + {r/\tau}} \right)}/2}\sqrt{2\pi}} \right)}}.}}$Then G(z) is unimodal to the right of z₀ and has minimum equal tomin{G(z₀),G(ζ)}.

From the relationship (1+z₀/τ)²=1+r/τ², with −τ<z₀≦0, one finds thatr=z₀(2τ+z₀), so it follows that z₀=r/(2τ+z₀) is between r/τ and r/2τ.

Lower bound G(z₀)=Φ(z₀) for z₀≦0 by the tangent line (½)+z₀φ(0), whichis at least (½)+r/(τ√{square root over (2π)}). Thus when r is such thatthe positive derivative condition holds, there is the gap lower boundallowing r_(up)<r≦0 which ismin{½+r/(τ√{square root over (2π)}),(r−r _(up))/[snr(τ² +r ₁)]}.This completes the demonstration of Corollary 19.

Next it is asked whether a useful bound might be available if G(z) isnot increasing at this z₀≦0. Then z₀≦z_(min), and the minimum of G(z) in[z₀,ζ] is either at z_(min) or at ζ. The G(z) is

${\Phi(z)} + {\frac{\left( {1 + {z_{0}/\tau}} \right)^{2} - \left( {1 + {z/\tau}} \right)^{2}}{\left( {1 + {z/\tau}} \right)^{2}}.}$Now since z_(min) is the negative solution to z²=2 log(ντ/c)−6log(1+z/τ), it follows that there z_(min) is near −√{square root over (2log(ντ/c))}. From the difference of squares, the second part ofG(z_(min)) is near 2(z₀−z_(min))/τ which is negative. So for G(z_(min))to be positive the Φ(z_(min)) would need to overcome that term. NowΦ(z_(min)) is near φ(z_(min))/|z_(min)|, and G′(z)=0 at z_(min) meansthat φ(z_(min)) equals the value (2/ντ)(1+z₀/τ)²/(1+z_(min)/τ)³.Accordingly, G(z_(min)) is near

$\frac{2\left( {1 + {z_{0}/\tau}} \right)^{2}}{v\;\tau\sqrt{2\mspace{11mu}{\log\left( {v\;{\tau/c}} \right)}}} + {\frac{2\left( {z_{0} + \sqrt{2\mspace{11mu}{\log\left( {v\;{\tau/c}} \right)}}} \right)}{\tau}.}$The implication is that by choice of r one can not push z₀ much to theleft of −√{square root over (2 log(ντ/c))} without losing positivity ofG(z).

Next examine when r_(up) is negative, whether r arbitrarily close tor_(up) can satisfy the conditions. That would require the r_(p)/τ to begreater than −√{square root over (2π)}/2 and greater than

$- {\sqrt{2\mspace{11mu}{\log\left( {v\;{{\tau\left( {1 + {r_{up}/\tau^{2}}} \right)}/2}\sqrt{2\pi}} \right)}}.}$However, in view of the formula for r_(up), it is near[1/(1+snr)−1]τ²=−ντ² when snr Φ(ζ) and ζ/τ are small. Consequently,r_(up)/τ is near −ντ. So if ντ is greater than a constant near √{squareroot over (2π)}/2 then the first of these conditions on r_(up)/τ is notsatisfied. Also with this r_(up)/τ near −ντ the argument of thelogarithm becomes ν(1−ν)τ/2√{square root over (2π)}, needed to begreater than 1. So if ντ is less than a constant near √{square root over(2π)}/2 then this argument of the logarithm is strictly less than 1.Thus the conditions for allowance of such negative r so close to r_(up)are vacuous. It is not possible to use an r so close to r_(up) when itis negative.

If when r_(up)/T is negative, near −ντ, try instead to haver/τ=−α√{square root over (2π)}/2 with 0≦α<1, then the first expressionin the minimum becomes (1−α)/2, the second expression becomesr−r_(up)/[ν(τ+ζ)²] near 1+r/[ντ²] equal to 1−α√{square root over(2π)}/(2ντ), and the additional condition becomes

${\alpha{\sqrt{2\pi}/2}} \leq {\sqrt{2\mspace{11mu}{\log\left( {v\left( {\frac{\tau}{2\sqrt{2\pi}} - {\alpha/4}} \right)} \right)}}.}$Which is acceptable with ντ at least a little more than 2√{square rootover (2π)}e^(π/4). So in this way the 1+r/τ² factor becomes at best near1√{square root over (2π)}/2τ. That is indeed a nice improvement factorin the rate, though not as ambitious as the unobtainable 1+r_(up)/τ²near 1−ν.

A particular negative r of interest would be one that makes(1+D(snr)/snr)(1+r/τ²)=1, for then even with constant power it wouldprovide no rate drop from capacity. With this choice1+r/τ²=1/(1+D(snr)/snr), the

${r/\tau} = {\frac{{- \tau}\;{{D({snr})}/{snr}}}{1 + {{D({snr})}/{snr}}}.}$That a multiple of −τ, where the multiple is near snr/2 when snr issmall. For G(z) to be increasing at the corresponding z₀, it is desiredthat the magnitude −r/τ be less than

$\sqrt{2\mspace{11mu}{\log\left( {v,{{{\tau\left( {1 + {r/\tau^{2}}} \right)}/2}\sqrt{2\pi}}} \right)}},$where the ν(1+r/tau²)/2 may be expressed as a function of snr, and isalso near snr/2 when snr is small. But that would mean that b=τsnr/2 isa value where b²≦2 log(b/√{square root over (2π)}, which a littlecalculus shows is not possible. Likewise, the above development of thecase that z₀ is to the left, of z_(min), shows that one can not allow−r/τ to be much greater than the same value.14.6 The Variance of Σ_(j sent)π_(j)1_(H) _(λ,k,j)

The variance of this weighted sum of Bernoulli that is to be tocontrolled is V/L=Σ_(j sent)π_(j) ²Φ(μ_(k,j)) Φ(μ_(k,j)) withμ_(k,j)=shift_(k,j)−τ. The shift_(k,j) may be written as √{square rootover (c_(k)π_(j))}τ, where c_(k)=νL(1−h′)/(2R(1−xν)(1+δ_(a))²) evaluatedat x=q_(1,k−1) ^(adj). Thus

${V/L} = {\sum\limits_{ell}\;{\pi_{(\ell)}^{2}\Phi{\overset{\_}{\Phi}\left( {\left( {\sqrt{c_{k}\pi_{(\ell)}} - 1} \right)\tau} \right)}}}$where Φ Φ(z) is the function formed by the product Φ(z) Φ(z).

In the no-leveling (c=0) case π_((l)=e) ^(−2C(l−1)/L)2

/(νL) and c_(k)π_((l))=u_(l)R′/(R(1−xν)) with R′=

(1−h′)/(1+δ_(a))², where u_(l)=e^(−2C(l−1)/L).

With a quantifiably small error as here before, now replace the sum overthe grid of values of t=l/L in [0,1] with the integral over thisinterval, yielding the value

$V = {\left( {2{\overset{\sim}{??}/v}} \right)^{2}{\int_{0}^{1}{{\mathbb{e}}^{{- 4}{??}\; t}\Phi{\overset{\_}{\Phi}\left( {\left( {\sqrt{\frac{{\mathbb{e}}^{{- 2}{??}\; t}{{??}^{\prime}/R}}{1 - {xv}}} - 1} \right)\tau} \right)}\ {{\mathbb{d}t}.}}}}$Change variables to ũ=e^(−Ct) it is expressed as

$V = {\frac{\left( {2{\overset{\sim}{??}/v}} \right)^{2}}{??}{\int_{{\mathbb{e}}^{- {??}}}^{1}{{\overset{\sim}{u}}^{3}\Phi{\overset{\_}{\Phi}\left( {\left( {{\overset{\sim}{u}\ \sqrt{\frac{R^{\prime}/R}{1 - {xv}}}} - 1} \right)\tau} \right)}{{\mathbb{d}\overset{\sim}{u}}.}}}}$To upper bound it replace the ũ³ factor with 1 and change variablesfurther to

$z = {\left( {{\overset{\sim}{u}\sqrt{\frac{R^{\prime}/R}{1 - {xv}}}} - 1} \right){\tau.}}$Thereby obtain an upper bound and V of

$\frac{\sqrt{1 - {xv}}}{\tau\sqrt{R^{\prime}/R}}\frac{\left( {2{\overset{\sim}{??}/v}} \right)^{2}}{??}{\int{\Phi{\overset{\_}{\Phi}(z)}{{\mathbb{d}z}.}}}$Now Φ Φ(z) has the upper bound (¼)e^(−z) ² ^(/2), which is √{square rootover (2π)}φ(z)/4, which when integrated on the line yields

$V \leq {\frac{\sqrt{1 - {xv}}}{\tau\sqrt{{??}^{\prime}/R}}\frac{\left( {\overset{\sim}{??}/v} \right)^{2}}{??}{\sqrt{2\pi}.}}$When R≦R′, then using

≦

and x≦1, it yields

$V \leq {\frac{\sqrt{2\pi}{??}}{v^{2}\tau}.}$This provides the desired upper bound on the variance.14.7 Slight Improvement to the Variance of Σ_(j sent) π_(j)1_(H)_(λ,k,j)

The variance of this weighted sum of Bernoulli that is desired to becontrolled is V/L=Σ_(j sent) π_(j) ²Φ(μ_(k,j)) Φ(μ_(k,j)) withμ_(k,j)=shift_(k,j)−τ. The shift_(k,j) may be written as √{square rootover (c_(k)π_(j))}τ, where c_(k)=νL(1−h′)/(2R(1−xν)(1+δ_(a))²) evaluatedat x=q_(1,k−1) ^(adj). Thus

${V/L} = {\sum\limits_{ell}\;{\pi_{(\ell)}^{2}\Phi{\overset{\_}{\Phi}\left( {\left( {\sqrt{c_{k}\pi_{(\ell)}} - 1} \right)\tau} \right)}}}$where Φ Φ(z) is the function formed by the product Φ(z) Φ(z).

In the no-leveling (c=0) case π_((l)=e) ^(−2C(l−1)/L)2

/(νL) and c_(k)π_((l))=u_(l)R′/(R(1−xν)) with R′=

(1−h′)/(1+δ_(a))², where u_(l)=e^(−2C(l−1)/L).

With a quantifiable small error as before, replace the sum over the gridof values of t=l/L in [0,1] with the integral over this interval,yielding the value

$V = {\left( {2{\overset{\sim}{??}/v}} \right)^{2}{\int_{0}^{1}\ {{\mathbb{e}}^{{- 4}{??}\; t}\Phi{\overset{\_}{\Phi}\left( {\left( {\sqrt{\frac{{\mathbb{e}}^{{- 2}{??}\; t}{{??}^{\prime}/R}}{1 - {xv}}} - 1} \right)\tau} \right)}{{\mathbb{d}t}.}}}}$Changing variables to ũ=e^(−Ct) it is expressed as

$V = {\frac{\left( {2{\overset{\sim}{??}/v}} \right)^{2}}{??}{\int_{{\mathbb{e}}^{- {??}}}^{1}\ {{\overset{\sim}{u}}^{3}\Phi{\overset{\_}{\Phi}\left( {\left( {{\overset{\sim}{u}\sqrt{\frac{R^{\prime}/R}{1 - {xv}}}} - 1} \right)\tau} \right)}{{\mathbb{d}\overset{\sim}{u}}.}}}}$To upper bound the above expression, change variables further to

$z = {\left( {{\overset{\sim}{u}\sqrt{\frac{R^{\prime}/R}{1 - {xv}}}} - 1} \right){\tau.}}$Thereby obtain an upper bound and V of

${\frac{\left( {1 - {xv}} \right)^{2}}{{\tau\left( {R^{\prime}/R} \right)}^{2}}\frac{\left( {2{\overset{\sim}{??}/v}} \right)^{2}}{??}{\int_{{({{{\mathbb{e}}^{- {??}}c_{0}} - 1})}\tau}^{{({c_{0} - 1})}\tau}{\left( {1 + {z/\tau}} \right)^{3}\Phi{\overset{\_}{\Phi}(z)}\ {\mathbb{d}z}}}},$where c₀=√{square root over (R′/R(1−xν))}. Now notice that (e^(−C)c₀−1)τis at least −τ, making 1+z/τ≧0 on the interval of integration.Accordingly, the above integral is can be bounded from above by,∫_(z≧−τ)(1+z/τ) ³Φ Φ(z)dz.Further, the integral of (1+z/τ)³Φ Φ(z) for z≦−τ is a negligible termthat is polynomially small in 1/B. Ignore that term in the rest of theanalysis. Correspondingly, it is desired to bound the integral,

$\frac{\left( {1 - {xv}} \right)^{2}}{{\tau\left( {R^{\prime}/R} \right)}^{2}}\frac{\left( {2{\overset{\sim}{??}/v}} \right)^{2}}{??}{\int{\left( {1 + {z/\tau}} \right)^{3}\Phi{\overset{\_}{\Phi}(z)}\ {{\mathbb{d}z}.}}}$Noticing that Φ Φ(z) is a symmetric function, the terms that are involvez and z³ after the expansion of (1+z/τ)³ above vanish upon integrating.Consequently, one only needs to bound the integral of Φ Φ(z) and z²ΦΦ(z). Doing this numerically the integral of the former is bounded bya₁=0.57 and that of the latter is bounded by a₂=0.48.

So ignoring the polynomially small term, the variance can be bounded by

$V \leq {\frac{\left( {1 - {xv}} \right)^{2}}{\left( {R^{\prime}/R} \right)^{2}}\frac{\left( {2{\overset{\sim}{??}/v}} \right)^{2}}{{??}\;\tau}{\left( {a_{1} + {a_{2}/\tau^{2}}} \right).}}$which is less than,

$\left( {1 - {xv}} \right)^{2}\frac{\left( {4{{??}/v^{2}}} \right)}{\tau}{\left( {a_{1} + {a_{2}/\tau^{2}}} \right).}$Bound the above quantity by (4

/ν²)(a₁+a₂/τ²)/τ. Let's ignore the a₂/τ² term since this of smallerorder. Then the variance can be bounded by 1.62/√{square root over (logB)}, where it is used that τ≧√{square root over (2 log B)} and that4a₁/√{square root over (2)} is less than 1.62.14.8 Normal Tails

Let Z be a standard normal random variable and let φ(z) be itsprobability density function, Φ(z) be its cumulative distribution andΦ(z)=1−Φ(z) be its upper tail probability for z>0. Here collect someproperties of this probability, beginning with a conclusion from Feller.Most familiar is his bound Φ(z)≦(1/z)φ(z) which may be stated as φ(z)/Φ(z) being at least z. His lower bound Φ(z)≧(1/z−1/z³)φ(z) has certainnatural improvements, which are expressed through upper bounds on φ(z)/Φ(z) showing how close it is to z.

Lemma 45.

For positive z the upper tail probability

{Z>z}= Φ(z) satisfies Φ(z)≦(√{square root over (2π)}/2)φ(z) andsatisfies the Feller expansion

${{\overset{\_}{\Phi}(z)} \sim {{\phi(z)}\left( {\frac{1}{z} - \frac{1}{z^{3}} + \frac{3}{z^{5}} - \frac{3.5}{z^{7}} + \ldots}\mspace{14mu} \right)}},$with terms of alternating sign, where terminating with any term ofpositive sign produces an upper bound and terminating with any term ofnegative sign produces a lower bound. Furthermore, for z>0 the ratioφ(z)/ Φ(z) is increasing and is less than z+1/z. Further improved boundsare that it is less than ξ(z) equal to 2 for 0≦z≦1 and equal to z+1/zfor z≧1, and, slightly better, φ(z)/ Φ(z) is less than [z+√{square rootover (z²+4)}]/2. Moreover, the positive φ(z)/ Φ(z)−z is a decreasingfunction of z.

Demonstration of Lemma 45:

The expansion is from the book by Feller (1968, Vol. 1, Chap. VII),where it is noted in particular that the first order upper boundΦ(z)<(1/z)φ(z) is obtained from φ′(t)=−tφ(t) by noting that zΦ(z)=z∫_(z) ^(∞)φ(t)dt is less than ∫_(z) ^(∞)tφ(t)dt=φ(z). Thus theratio φ(z)/Φ(z) exceeds z. It follows that the derivative of the ratioφ(z)/ Φ(z) which is [φ(z)/ Φ(z)−z]φ(z)/ Φ(z) is positive, so this ratiois increasing and at least its value at z=0, which is 2/√{square rootover (2π)}.

Now for any positive c consider the positive integral ∫_(z)^(∞)(t/c−1)²φ(t) dt. By expanding the square and using that (t²−1)φ(t)is the derivative of −tφ(t) on sees that this integral is (1+1/c²)Φ(z)−(2/c−z/c²)φ(z). Multiplying through by c², and assuming 2c>z, itspositivity gives the family of bounds

${{\phi(z)}/{\overset{\_}{\Phi}(z)}} \leq {\frac{c^{2} + 1}{{2c} - z}.}$Evaluating it at c=z gives the upper bound on the ratio of(z²+1)/z=z+1/z. Note that since z/(z²+1) equals 1/z−1/[z(z²+1)] itimproves on 1/z−1/z³ for all z≧0, Since φ(z)/ Φ(z) is increasing one canreplace the upper bound z+1/z with its lower increasing envelope, whichis the claimed bound ξ(z), noting that z+1/z takes its minimum value of2 at z=1 and is increasing thereafter. For further improvement note thatφ(z)/ Φ(z) equals a value not more than 1.53 at z=1, so the bound 2 for0≦z≦1 may be replaced by 1.53.

Next let's determine the best bound of the above form by optimizing thechoice of c. The derivative of the bound is the ratio of2c(2c−z)−2(c²+1) and (2c−z)² and the c that sets it to 0 solvesc²−zc−1=0 for which c=[z+√{square root over (z²+4)}]/2, and the abovebound is then equal to this c.

As for the monotonicity of φ(z)/ Φ(z)−z, its derivative is (φ/ Φ)²−z(φ/Φ)−1 which is a quadratic in the positive quantity φ/ Φ, abbreviatingφ(z)/ Φ(z). Hence by inspecting the quadratic formula, this derivativeis negative if φ/ Φ is less than or equal to [z+√{square root over(z²+4)}]/2, which it is by the above bound. This completes thedemonstration of Lemma 45.

It is remarked that log φ(z)/ Φ(z) has first derivative φ(z)/ Φ(z)−zequal to the quantity studied in this lemma and second derivative foundabove to be negative. So the fact that φ(z)/ Φ(z)−z is decreasing isequivalent to the normal hazard function φ(z)/ Φ(z) being log-concave.

14.9 Tails for Weighted Bernoulli Sums

Lemma 46.

Let W_(j), 1≦j≦N be N independent Bernoulli(r_(j)) random variables.Furthermore, let α_(j), 1≦j≦K be non-negative weights that sum to 1 andlet N_(α)=1/max_(j)α_(j). Then the weighted sum {circumflex over(r)}=Σ_(j)α_(j)W_(j) which has mean given by r*=Σ_(j)α_(j)r_(j),satisfies the following large deviation inequalities. For any r with0<r<r*,P({circumflex over (r)}<r)≦exp{−N _(α) D(r∥r*)}and for any {tilde over (r)} with r*<{tilde over (r)}<1.P({circumflex over (r)}>{tilde over (r)})≦exp{−N _(α) D({tilde over(r)}∥r*)}where D(r∥r*) denotes the relative entropy between Bernoulli randomvariables of success parameters r and r*.

Demonstration of Lemma 46:

Let's prove the first part. The proof of the second part is similar.

Denote the event

${??} = \left\{ {\underset{\_}{W}:{{\sum\limits_{j}{\alpha_{j}W_{j}}} \leq r}} \right\}$with W denoting the N-vector of W_(j)'s. Proceeding as in Csiszar (Ann.Probab. 1984) it follows that

$\begin{matrix}{{P({??})} = {\exp\left\{ {- {D\left( {P_{\underset{\_}{W}|{??}}{}P_{\underset{\_}{W}}} \right)}} \right\}}} \\{\leq {\exp\left\{ {- {\sum\limits_{j}{D\left( {P_{W_{j}|{??}}{}P_{W_{j}}} \right)}}} \right\}}}\end{matrix}$

Here

denotes the conditional distribution of the vector W conditional on theevent

and

denotes the associated marginal distribution of W_(j) conditioned on

. Now

${\sum\limits_{j}{D\left( {P_{W_{j}|{??}}{}P_{W_{j}}} \right)}} \geq {N_{\alpha}{\sum\limits_{j}{\alpha_{j}{{D\left( {P_{W_{j}|{??}}{}P_{W_{j}}} \right)}.}}}}$Furthermore, the convexity of the relative entropy implies that

${\sum\limits_{j}{\alpha_{j}{D\left( {P_{W_{j}|{??}}{}P_{W_{j}}} \right)}}} \geq {{D\left( {\sum\limits_{j}{\alpha_{j}P_{W_{j}|{??}}{}{\sum\limits_{j}{\alpha_{j}P_{W_{j}}}}}} \right)}.}$The sums on the right denote α mixtures of distributions

and P_(W) _(j) , respectively, which are distributions on {0, 1}, andhence these mixtures are also distributions on {0, 1}. In particular,Σ_(j)α_(j)P_(W) _(j) is the Bernoulli(r*) distribution and Σ_(j)α_(j)

is the Bernoulli(r_(e)) distribution where

$r_{e} = {{E\left\lbrack {\sum\limits_{j}{\alpha_{j}W_{j}}} \middle| {??} \right\rbrack} = {{E\left\lbrack \hat{r} \middle| {??} \right\rbrack}.}}$But in the event

it holds that {circumflex over (r)}≦r so it follows that r_(e)≦r. Asr<r* this yields D(r_(e)∥r*)≧D(r∥r*). This completes the demonstrationof Lenuna 46.14.10 Lower Bounds on D

Lemma 47.

For p≧p* the relative entropy between Bernoulli(p) and Bernoulli(p*)distributions has the succession of lower bounds

${D_{Ber}\left( {p{}p^{*}} \right)} \geq {D_{Poi}\left( {p{}p^{*}} \right)} \geq {2\left( {\sqrt{p} - \sqrt{p^{*}}} \right)^{2}} \geq \frac{\left( {p - p^{*}} \right)^{2}}{2p}$where D_(P) _(oi) (p∥p*)=p log p/p*+p*−p is also recognizable as therelative entropy between Poisson distributions of mean p and p*respectively.

Remark A:

There are analogous statements for pairs of probability distributions Pand P* on a measurable space χ with densities p(x) and p*(x) withrespect to a dominating measure μ. The relative entropy D(P∥P*) which is∫p(x) log p(x)/p*(x)μ(dx) may be written as the integral of thenon-negative integrand p(x) log p(x)/p*(x)+p*(x)−p(x), which exceeds(½)(p(x)−p*(x))²max{p(x),p*(x)}. It is familiar that D(P∥P*) exceeds thesquared Bellinger distance H²(P,P*)=∫(√{square root over(p(x))}−√{square root over (p*(x))})²μ(dx). That fact arises forinstance via Jensen's inequality, from which D exceeds 2 log1/(1−(1/2)H²) which in turn is at least H².

Remark B:

When {circumflex over (p)} is the relative frequency of occurrence in Nindependent Bernoulli trials it has the bound P{{circumflex over(p)}>p}≦e^(−ND) ^(Ber) ^((p∥p*)) on the upper tail of the Binomialdistribution of N{circumflex over (p)} for p>p*. In accordance with thePoisson interpretation of the lower bound on the exponent, one sees thatthis upper tail of the Binomial is in turn bounded by the correspondinglarge deviation expression that would hold if the random variables werePoisson.

Demonstration of Lemma 47:

The Bernoulli relative entropy may be expressed as the sum of twopositive terms, one of which is p log p/p*+p*−p, and the other is thecorresponding term with 1−p and 1−p* in place of p and p*, so thisdemonstrates the first inequality. Now suppose p>p*. Write p logp/p*+p*−p as p*F(s) where F(s)=2s² log s+1−s² with s²=p/p* which is atleast 1. This function F and its first derivative F′(s)=4s log s havevalue equal to 0 at s=1, and its second derivative F″(s)=4+4 log s is atleast 4 for s≧1. So by second order Taylor expansion F(s)≧2(s−1)² fors≧1. Thus p log p/p*+p*−p is at least 2(√{square root over (p)}−√{squareroot over (p*)})². Furthermore 2(s−1)²≧(s²−1)²/(2s²) as, taking thesquare root of both sides, it is seen to be equivalent to 2(s−1)≧s²−1,which, factoring out s−1 from both sides, is seen to hold for s≧1. Fromthis the final lower bound (p−p*)²/(2p) is obtained. This completes thedemonstration of Lemma 47.

ACKNOWLEDGMENT

The inventors herein thank Creighton Hauikulani who performedsimulations of the decoder and earlier incarnations of it in fall 2009and spring 2010 while completed masters studies in the Department ofStatistics at Yale.

FINAL STATEMENT

Although the invention has been described in terms of specificembodiments and applications, persons skilled in the art may, in lightof this teaching, generate additional embodiments without exceeding thescope or departing from the spirit of the invention described andclaimed herein. Accordingly, it is to be understood that the drawing anddescription in this disclosure are proffered to facilitate comprehensionof the invention, and should not be construed to limit the scopethereof.

What is claimed is:
 1. A sparse superposition encoder for a structuredcode for encoding digital information for transmission over a datachannel, the encoder comprising: a memory for storing a design matrixformed of a plurality of column vectors X₁, X₂, . . . , X_(N), each suchvector having n coordinates; and an input for entering a sequence ofinput bits u₁, u₂, . . . , U_(K) which determine a plurality ofcoefficients β₁, . . . , β_(N), each of the coefficients beingassociated with a respective one of the vectors of the design matrix toform codeword vectors, with real or complex-valued entries, in the formof superpositions β₁X₁+β₂X₂+ . . . +β_(N)X_(N), the sequence of bits u₁,u₂, . . . , U_(K) constituting at least a portion of the digitalinformation; wherein: (1) at least some of the plurality of thecoefficients β_(j) have a predetermined value multiplied selectably by+1, or the predetermined value multiplied by −1; (2) at least some ofthe plurality of the coefficients β_(j) have a zero value, a number ofthe plurality of the coefficients β_(j) having a non-zero value beingdenoted L and the value B=N/L controlling an extent of sparsity; (3) adictionary is generated by independent standard normal random variables;or (4) the dictionary is generated by independent, equiprobable, +1 or−1, random variables.
 2. The encoder of claim 1, wherein the pluralityof the coefficients β_(j) have selectably the non-zero value, or thezero value.
 3. The encoder of claim 1, wherein (1) at least some of theplurality of the coefficients β_(j) have the predetermined valuemultiplied selectably by the +1, or the predetermined value multipliedby the −1.
 4. The encoder of claim 1, wherein (2) at least some of theplurality of the coefficients β_(j) have the zero value, the number ofthe plurality of the coefficients β_(j) having the non-zero value beingdenoted the L and the value B=N/L controlling the extent of thesparsity.
 5. The encoder of claim 4, wherein there is an adder thatcomputes each entry of the codeword vector as the superposition ofcorresponding dictionary elements for which the coefficients arenon-zero.
 6. The encoder of claim 4, wherein there are n adderscomputing the codeword vector entries as the superposition of selected Lcolumns of a dictionary in parallel.
 7. The encoder of claim 1, whereinthe design matrix stored in the memory is partitioned into L sections,with each section having B columns, where L>1.
 8. The encoder of claim7, wherein each of the L sections of size B has B memory positions, onefor each column of a dictionary, where B has a value corresponding to apower of 2, said positions addressed (selected) by binary strings oflength log₂(B).
 9. The encoder of claim 8, wherein the input bit stringof length K=L log₂B is split into L substrings, wherein for each sectionthe associated substring provides the memory address of which one columnis flagged to have a non-zero coefficient.
 10. The encoder of claim 8,wherein an input to the code arises as an output of a Reed-Solomon outercode of alphabet size B and length L for the purpose of maintaining anoptimal separation, with distance measured by a fraction of distinctselections of non-zero terms.
 11. The encoder of claim 7, wherein only 1out of B coefficients in each section is non-zero.
 12. The encoder ofclaim 7, wherein the L sections each has allocated a respective powerthat determines squared magnitudes of non-zero coefficients, denoted P₁,P₂, . . . , P_(L), one from each section.
 13. The encoder of claim 12,wherein the respectively allocated powers sum to a total P to achieve apredetermined transmission power.
 14. The encoder of claim 12, whereinthe allocated powers are determined in a set of variable powerassignments that permit a code rate up to value C_(B) where, withincreasing sparsity B, this value approaches the capacity C=1/2log₂(1+P/σ²) for a Gaussian noise channel of noise variance σ².
 15. Theencoder of claim 14, wherein the code rate is R=K/n, for an arbitrary Rwhere R<C, for an additive channel of capacity C, and a partitionedsuperposition code rate is R=(L log B)/n.
 16. The encoder of claim 12,wherein before initiating communications, the allocated magnitudes arepre-multiplied to the columns of each section of the design matrix X, sothat only adders are subsequently required of an encoder processor toform the code words codeword vectors.
 17. The encoder of claim 16,wherein R/log(B) is arranged to be bounded so that encoder computationtime to form the superposition of L columns is not larger than order n,yielding constant computation time per symbol sent.
 18. The encoder ofclaim 17, wherein R/log(B) is chosen to be small.
 19. The encoder ofclaim 7, wherein encoder size complexity is not more than nBL memorypositions to hold the design matrix and n adders.
 20. The encoder ofclaim 7, wherein the value of B is chosen to be not more than a constanttimes n, whereupon also L is not more than n divided by a log, so thatencoder size complexity nBL is not more than n³.
 21. The encoder ofclaim 1, wherein (3) the dictionary is generated by the independentstandard normal random variables.
 22. The encoder of claim 21, whereinthe random variables are provided to specified predetermined precision.23. The encoder of claim 1, wherein (4) the dictionary is generated bythe independent, equiprobable, +1 or −1, random variables.
 24. A sparsesuperposition encoder for a structured code for encoding digitalinformation for transmission over a data channel, the encodercomprising: a memory for storing a design matrix formed of a pluralityof column vectors X₁, X₂, . . . , X_(N), each such vector having ncoordinates; and an input for entering a sequence of input bits u₁, u₂,. . . , U_(K) which determine a plurality of coefficients β₁, . . . ,β_(N), each of the coefficients being associated with a respective oneof the vectors of the design matrix to form codeword vectors, with realor complex-valued entries, in the form of superpositions β₁X₁+β₂X₂+ . .. +β_(N)X_(N), the sequence of bits u₁, u₂, . . . , U_(K) constitutingat least a portion of the digital information; wherein the design matrixstored in the memory is partitioned into L sections, with each sectionhaving B columns, where L>1; wherein: (1) each of the L sections of sizeB has B memory positions, one for each column of a dictionary, where Bhas a value corresponding to a power of 2, said positions addressed(selected) by binary strings of length log₂(B); (2) only 1 out of Bcoefficients in each section is non-zero; (3) the L sections each hasallocated a respective power that determines squared magnitudes ofnon-zero coefficients, denoted P₁, P₂, . . . , P_(L), one from eachsection; (4) encoder size complexity is not more than nBL memorypositions to hold the design matrix and n adders; or (5) the value of Bis chosen to be not more than a constant times n, whereupon also L isnot more than n divided by a log, so that encoder size complexity nBL isnot more than n³.
 25. The encoder of claim 24, wherein the plurality ofthe coefficients β_(j) have selectably a determined non-zero value, or azero value.
 26. The encoder of claim 24, wherein at least some of theplurality of the coefficients β_(j) have a predetermined valuemultiplied selectably by +1, or the predetermined value multiplied by−1.
 27. The encoder of claim 24, wherein at least some of the pluralityof the coefficients β_(j) have a zero value, the number of the pluralityof the coefficients β_(j) having a non-zero value being denoted L andthe value B=N/L controlling an extent of sparsity.
 28. The encoder ofclaim 27, wherein there is an adder that computes each entry of thecodeword vector as the superposition of corresponding dictionaryelements for which the coefficients are non-zero.
 29. The encoder ofclaim 27, wherein there are n adders computing the codeword vectorentries as the superposition of selected L columns of a dictionary inparallel.
 30. The encoder of claim 24, wherein (1) each of the Lsections of the size B has the B memory positions, one for each of thecolumn of the dictionary, where the B has the value corresponding to thepower of 2, said positions addressed (selected) by the binary strings ofthe length log₂(B).
 31. The encoder of claim 30, wherein the input bitstring of length K=L log₂B is split into L substrings, wherein for eachsection the associated substring provides the memory address of whichone column is flagged to have a non-zero coefficient.
 32. The encoder ofclaim 30, wherein an input to the code arises as an output of aReed-Solomon outer code of alphabet size B and length L for the purposeof maintaining an optimal separation, with distance measured by afraction of distinct selections of non-zero terms.
 33. The encoder ofclaim 24, wherein (2) only the 1 out of the B coefficients in said eachsection is the non-zero.
 34. The encoder of claim 24, wherein (3) the Lsections each has allocated the respective power that determines thesquared magnitudes of the non-zero coefficients, denoted the P₁, P₂, . .. , P_(L), the one from each section.
 35. The encoder of claim 34,wherein the respectively allocated powers sum to a total P to achieve apredetermined transmission power.
 36. The encoder of claim 34, whereinthe allocated powers are determined in a set of variable powerassignments that permit a code rate up to value C_(B) where, withincreasing sparsity B, this value approaches the capacity C=1/2log₂(1+P/σ²) for a Gaussian noise channel of noise variance σ².
 37. Theencoder of claim 36, wherein the code rate is R=K/n, for an arbitrary Rwhere R<C, for an additive channel of capacity C, and a partitionedsuperposition code rate is R=(L log B)/n.
 38. The encoder of claim 34,wherein before initiating communications, the allocated magnitudes arepre-multiplied to the columns of each section of the design matrix X, sothat only adders are subsequently required of an encoder processor toform the codeword vectors.
 39. The encoder of claim 38, wherein R/log(B)is arranged to be bounded so that encoder computation time to form thesuperposition of L columns is not larger than order n, yielding constantcomputation time per symbol sent.
 40. The encoder of claim 39, whereinR/log(B) is chosen to be small.
 41. The encoder of claim 24, wherein (4)the encoder size complexity is not more than the nBL memory positions tohold the design matrix and the n adders.
 42. The encoder of claim 24,wherein (5) the value of B is chosen to be not more than the constanttimes the n, whereupon also the L is not more than the n divided by thelog, so that the encoder size complexity nBL is not more than the n³.43. The encoder of claim 24, wherein a dictionary is generated byindependent standard normal random variables.
 44. The encoder of claim43, wherein the random variables are provided to specified predeterminedprecision.
 45. The encoder of claim 24, wherein a dictionary isgenerated by independent, equiprobable, +1 or −1, random variables.